Using the PVH entry point, the uncompressed vmlinux is loaded at
LOAD_PHYSICAL_ADDR, and execution starts in 32bit mode at the
address in XEN_ELFNOTE_PHYS32_ENTRY, pvh_start_xen, with paging
disabled.
Loading at LOAD_PHYSICAL_ADDR has not been a problem in the past as
virtual machines don't have conflicting memory maps. But Xen now
supports a PVH dom0, which uses the host memory map, and there are
Coreboot/EDK2 firmwares that have reserved regions conflicting with
LOAD_PHYSICAL_ADDR. Xen recently added XEN_ELFNOTE_PHYS32_RELOC to
specify an alignment, minimum and maximum load address when
LOAD_PHYSICAL_ADDR cannot be used. This patch series makes the PVH
entry path PIC to support relocation.
Only x86-64 is converted. The 32bit entry path calling into vmlinux,
which is not PIC, will not support relocation.
The entry path needs pages tables to switch to 64bit mode. A new
pvh_init_top_pgt is added to make the transition into the startup_64
when the regular init_top_pgt pagetables are setup. This duplication is
unfortunate, but it keeps the changes simpler. __startup_64() can't be
used to setup init_top_pgt for PVH entry because it is 64bit code - the
32bit entry code doesn't have page tables to use.
This is the straight forward implementation to make it work. Other
approaches could be pursued.
checkpatch.pl gives an error: "ERROR: Macros with multiple statements
should be enclosed in a do - while loop" about the moved PMDS macro.
But PMDS is an assembler macro, so its not applicable. There are some
false positive warnings "WARNING: space prohibited between function name
and open parenthesis '('" about the macro, too.
Jason Andryuk (5):
xen: sync elfnote.h from xen tree
x86/pvh: Make PVH entrypoint PIC for x86-64
x86/pvh: Set phys_base when calling xen_prepare_pvh()
x86/kernel: Move page table macros to new header
x86/pvh: Add 64bit relocation page tables
arch/x86/kernel/head_64.S | 22 +---
arch/x86/kernel/pgtable_64_helpers.h | 28 +++++
arch/x86/platform/pvh/head.S | 157 +++++++++++++++++++++++++--
include/xen/interface/elfnote.h | 93 +++++++++++++++-
4 files changed, 265 insertions(+), 35 deletions(-)
create mode 100644 arch/x86/kernel/pgtable_64_helpers.h
--
2.44.0
The PVH entrypoint is 32bit non-PIC code running the uncompressed
vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000
(16MB). The kernel is loaded at that physical address inside the VM by
the VMM software (Xen/QEMU).
When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
into the PVH container. There exist system firmwares (Coreboot/EDK2)
with reserved memory at 16MB. This creates a conflict where the PVH
kernel cannot be loaded at that address.
Modify the PVH entrypoint to be position-indepedent to allow flexibility
in load address. Only the 64bit entry path is converted. A 32bit
kernel is not PIC, so calling into other parts of the kernel, like
xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
relocated.
This makes the code PIC, but the page tables need to be updated as well
to handle running from the kernel high map.
The UNWIND_HINT_END_OF_STACK is to silence:
vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
after the lret into 64bit code.
Signed-off-by: Jason Andryuk <[email protected]>
---
---
arch/x86/platform/pvh/head.S | 44 ++++++++++++++++++++++++++++--------
1 file changed, 34 insertions(+), 10 deletions(-)
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index f7235ef87bc3..bb1e582e32b1 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -7,6 +7,7 @@
.code32
.text
#define _pa(x) ((x) - __START_KERNEL_map)
+#define rva(x) ((x) - pvh_start_xen)
#include <linux/elfnote.h>
#include <linux/init.h>
@@ -54,7 +55,25 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
UNWIND_HINT_END_OF_STACK
cld
- lgdt (_pa(gdt))
+ /*
+ * See the comment for startup_32 for more details. We need to
+ * execute a call to get the execution address to be position
+ * independent, but we don't have a stack. Save and restore the
+ * magic field of start_info in ebx, and use that as the stack.
+ */
+ mov (%ebx), %eax
+ leal 4(%ebx), %esp
+ ANNOTATE_INTRA_FUNCTION_CALL
+ call 1f
+1: popl %ebp
+ mov %eax, (%ebx)
+ subl $rva(1b), %ebp
+ movl $0, %esp
+
+ leal rva(gdt)(%ebp), %eax
+ leal rva(gdt_start)(%ebp), %ecx
+ movl %ecx, 2(%eax)
+ lgdt (%eax)
mov $PVH_DS_SEL,%eax
mov %eax,%ds
@@ -62,14 +81,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
mov %eax,%ss
/* Stash hvm_start_info. */
- mov $_pa(pvh_start_info), %edi
+ leal rva(pvh_start_info)(%ebp), %edi
mov %ebx, %esi
- mov _pa(pvh_start_info_sz), %ecx
+ movl rva(pvh_start_info_sz)(%ebp), %ecx
shr $2,%ecx
rep
movsl
- mov $_pa(early_stack_end), %esp
+ leal rva(early_stack_end)(%ebp), %esp
/* Enable PAE mode. */
mov %cr4, %eax
@@ -84,28 +103,33 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
wrmsr
/* Enable pre-constructed page tables. */
- mov $_pa(init_top_pgt), %eax
+ leal rva(init_top_pgt)(%ebp), %eax
mov %eax, %cr3
mov $(X86_CR0_PG | X86_CR0_PE), %eax
mov %eax, %cr0
/* Jump to 64-bit mode. */
- ljmp $PVH_CS_SEL, $_pa(1f)
+ pushl $PVH_CS_SEL
+ leal rva(1f)(%ebp), %eax
+ pushl %eax
+ lretl
/* 64-bit entry point. */
.code64
1:
+ UNWIND_HINT_END_OF_STACK
+
/* Set base address in stack canary descriptor. */
mov $MSR_GS_BASE,%ecx
- mov $_pa(canary), %eax
+ leal rva(canary)(%ebp), %eax
xor %edx, %edx
wrmsr
call xen_prepare_pvh
/* startup_64 expects boot_params in %rsi. */
- mov $_pa(pvh_bootparams), %rsi
- mov $_pa(startup_64), %rax
+ lea rva(pvh_bootparams)(%ebp), %rsi
+ lea rva(startup_64)(%ebp), %rax
ANNOTATE_RETPOLINE_SAFE
jmp *%rax
@@ -143,7 +167,7 @@ SYM_CODE_END(pvh_start_xen)
.balign 8
SYM_DATA_START_LOCAL(gdt)
.word gdt_end - gdt_start
- .long _pa(gdt_start)
+ .long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
.word 0
SYM_DATA_END(gdt)
SYM_DATA_START_LOCAL(gdt_start)
--
2.44.0
The PVH entry point is 32bit. For a 64bit kernel, the entry point must
switch to 64bit mode, which requires a set of page tables. In the past,
PVH used init_top_pgt.
This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
page tables are prebuilt for this address. If the kernel is loaded at a
different address, they need to be adjusted.
__startup_64() adjusts the prebuilt page tables for the physical load
address, but it is 64bit code. The 32bit PVH entry code can't call it
to adjust the page tables, so it can't readily be re-used.
64bit PVH entry needs page tables set up for identity map, the kernel
high map and the direct map. pvh_start_xen() enters identity mapped.
Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
into the highmap. The direct map is used for __va() on the initramfs
and other guest physical addresses.
Add a dedicated set of prebuild page tables for PVH entry. They are
adjusted in assembly before loading.
Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
along with the kernel's loading constraints. The maximum load address,
KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
page. It could be larger with more pages.
Signed-off-by: Jason Andryuk <[email protected]>
---
Instead of adding 5 pages of prebuilt page tables, they could be
contructed dynamically in the .bss area. They are then only used for
PVH entry and until transitioning to init_top_pgt. The .bss is later
cleared. It's safer to add the dedicated pages, so that is done here.
---
arch/x86/platform/pvh/head.S | 105 ++++++++++++++++++++++++++++++++++-
1 file changed, 104 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index c08d08d8cc92..4af3cfbcf2f8 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -21,6 +21,8 @@
#include <asm/nospec-branch.h>
#include <xen/interface/elfnote.h>
+#include "../kernel/pgtable_64_helpers.h"
+
__HEAD
/*
@@ -102,8 +104,47 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
btsl $_EFER_LME, %eax
wrmsr
+ mov %ebp, %ebx
+ subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */
+ jz .Lpagetable_done
+
+ /* Fixup page-tables for relocation. */
+ leal rva(pvh_init_top_pgt)(%ebp), %edi
+ movl $512, %ecx
+2:
+ testl $_PAGE_PRESENT, 0x00(%edi)
+ jz 1f
+ addl %ebx, 0x00(%edi)
+1:
+ addl $8, %edi
+ decl %ecx
+ jnz 2b
+
+ /* L3 ident has a single entry. */
+ leal rva(pvh_level3_ident_pgt)(%ebp), %edi
+ addl %ebx, 0x00(%edi)
+
+ leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
+ addl %ebx, (4096 - 16)(%edi)
+ addl %ebx, (4096 - 8)(%edi)
+
+ /* pvh_level2_ident_pgt is fine - large pages */
+
+ /* pvh_level2_kernel_pgt needs adjustment - large pages */
+ leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
+ movl $512, %ecx
+2:
+ testl $_PAGE_PRESENT, 0x00(%edi)
+ jz 1f
+ addl %ebx, 0x00(%edi)
+1:
+ addl $8, %edi
+ decl %ecx
+ jnz 2b
+
+.Lpagetable_done:
/* Enable pre-constructed page tables. */
- leal rva(init_top_pgt)(%ebp), %eax
+ leal rva(pvh_init_top_pgt)(%ebp), %eax
mov %eax, %cr3
mov $(X86_CR0_PG | X86_CR0_PE), %eax
mov %eax, %cr0
@@ -197,5 +238,67 @@ SYM_DATA_START_LOCAL(early_stack)
.fill BOOT_STACK_SIZE, 1, 0
SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
+#ifdef CONFIG_X86_64
+/*
+ * Xen PVH needs a set of identity mapped and kernel high mapping
+ * page tables. pvh_start_xen starts running on the identity mapped
+ * page tables, but xen_prepare_pvh calls into the high mapping.
+ * These page tables need to be relocatable and are only used until
+ * startup_64 transitions to init_top_pgt.
+ */
+SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
+ .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .org pvh_init_top_pgt + L4_PAGE_OFFSET*8, 0
+ .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .org pvh_init_top_pgt + L4_START_KERNEL*8, 0
+ /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+ .quad pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+SYM_DATA_END(pvh_init_top_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
+ .quad pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .fill 511, 8, 0
+SYM_DATA_END(pvh_level3_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
+ /*
+ * Since I easily can, map the first 1G.
+ * Don't set NX because code runs from these pages.
+ *
+ * Note: This sets _PAGE_GLOBAL despite whether
+ * the CPU supports it or it is enabled. But,
+ * the CPU should ignore the bit.
+ */
+ PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+SYM_DATA_END(pvh_level2_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt)
+ .fill L3_START_KERNEL,8,0
+ /* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
+ .quad pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+ .quad 0 /* no fixmap */
+SYM_DATA_END(pvh_level3_kernel_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt)
+ /*
+ * Kernel high mapping.
+ *
+ * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
+ * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
+ * 512 MiB otherwise.
+ *
+ * (NOTE: after that starts the module area, see MODULES_VADDR.)
+ *
+ * This table is eventually used by the kernel during normal runtime.
+ * Care must be taken to clear out undesired bits later, like _PAGE_RW
+ * or _PAGE_GLOBAL in some cases.
+ */
+ PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
+SYM_DATA_END(pvh_level2_kernel_pgt)
+
+ ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_RELOC,
+ .long CONFIG_PHYSICAL_ALIGN;
+ .long LOAD_PHYSICAL_ADDR;
+ .long KERNEL_IMAGE_SIZE - 1)
+#endif
+
ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
_ASM_PTR (pvh_start_xen - __START_KERNEL_map))
--
2.44.0
phys_base needs to be set for __pa() to work in xen_pvh_init() when
finding the hypercall page. Set it before calling into
xen_prepare_pvh(), which calls xen_pvh_init(). Clear it afterward to
avoid __startup_64() adding to it and creating an incorrect value.
Signed-off-by: Jason Andryuk <[email protected]>
---
Instead of setting and clearing phys_base, a dedicated variable could be
used just for the hypercall page. Having phys_base set properly may
avoid further issues if the use of phys_base or __pa() grows.
---
arch/x86/platform/pvh/head.S | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index bb1e582e32b1..c08d08d8cc92 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -125,7 +125,17 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
xor %edx, %edx
wrmsr
+ /* Calculate load offset from LOAD_PHYSICAL_ADDR and store in
+ * phys_base. __pa() needs phys_base set to calculate the
+ * hypercall page in xen_pvh_init(). */
+ movq %rbp, %rbx
+ subq $LOAD_PHYSICAL_ADDR, %rbx
+ movq %rbx, phys_base(%rip)
call xen_prepare_pvh
+ /* Clear phys_base. __startup_64 will *add* to its value,
+ * so reset to 0. */
+ xor %rbx, %rbx
+ movq %rbx, phys_base(%rip)
/* startup_64 expects boot_params in %rsi. */
lea rva(pvh_bootparams)(%ebp), %rsi
--
2.44.0
Sync Xen's elfnote.h header from xen.git to pull in the
XEN_ELFNOTE_PHYS32_RELOC define.
xen commit dfc9fab00378 ("x86/PVH: Support relocatable dom0 kernels")
This is a copy except for the removal of the emacs editor config at the
end of the file.
Signed-off-by: Jason Andryuk <[email protected]>
---
include/xen/interface/elfnote.h | 93 +++++++++++++++++++++++++++++++--
1 file changed, 88 insertions(+), 5 deletions(-)
diff --git a/include/xen/interface/elfnote.h b/include/xen/interface/elfnote.h
index 38deb1214613..918f47d87d7a 100644
--- a/include/xen/interface/elfnote.h
+++ b/include/xen/interface/elfnote.h
@@ -11,7 +11,9 @@
#define __XEN_PUBLIC_ELFNOTE_H__
/*
- * The notes should live in a SHT_NOTE segment and have "Xen" in the
+ * `incontents 200 elfnotes ELF notes
+ *
+ * The notes should live in a PT_NOTE segment and have "Xen" in the
* name field.
*
* Numeric types are either 4 or 8 bytes depending on the content of
@@ -22,6 +24,8 @@
*
* String values (for non-legacy) are NULL terminated ASCII, also known
* as ASCIZ type.
+ *
+ * Xen only uses ELF Notes contained in x86 binaries.
*/
/*
@@ -52,7 +56,7 @@
#define XEN_ELFNOTE_VIRT_BASE 3
/*
- * The offset of the ELF paddr field from the acutal required
+ * The offset of the ELF paddr field from the actual required
* pseudo-physical address (numeric).
*
* This is used to maintain backwards compatibility with older kernels
@@ -92,7 +96,12 @@
#define XEN_ELFNOTE_LOADER 8
/*
- * The kernel supports PAE (x86/32 only, string = "yes" or "no").
+ * The kernel supports PAE (x86/32 only, string = "yes", "no" or
+ * "bimodal").
+ *
+ * For compatibility with Xen 3.0.3 and earlier the "bimodal" setting
+ * may be given as "yes,bimodal" which will cause older Xen to treat
+ * this kernel as PAE.
*
* LEGACY: PAE (n.b. The legacy interface included a provision to
* indicate 'extended-cr3' support allowing L3 page tables to be
@@ -149,7 +158,9 @@
* The (non-default) location the initial phys-to-machine map should be
* placed at by the hypervisor (Dom0) or the tools (DomU).
* The kernel must be prepared for this mapping to be established using
- * large pages, despite such otherwise not being available to guests.
+ * large pages, despite such otherwise not being available to guests. Note
+ * that these large pages may be misaligned in PFN space (they'll obviously
+ * be aligned in MFN and virtual address spaces).
* The kernel must also be able to handle the page table pages used for
* this mapping not being accessible through the initial mapping.
* (Only x86-64 supports this at present.)
@@ -185,9 +196,81 @@
*/
#define XEN_ELFNOTE_PHYS32_ENTRY 18
+/*
+ * Physical loading constraints for PVH kernels
+ *
+ * The presence of this note indicates the kernel supports relocating itself.
+ *
+ * The note may include up to three 32bit values to place constraints on the
+ * guest physical loading addresses and alignment for a PVH kernel. Values
+ * are read in the following order:
+ * - a required start alignment (default 0x200000)
+ * - a minimum address for the start of the image (default 0; see below)
+ * - a maximum address for the last byte of the image (default 0xffffffff)
+ *
+ * When this note specifies an alignment value, it is used. Otherwise the
+ * maximum p_align value from loadable ELF Program Headers is used, if it is
+ * greater than or equal to 4k (0x1000). Otherwise, the default is used.
+ */
+#define XEN_ELFNOTE_PHYS32_RELOC 19
+
/*
* The number of the highest elfnote defined.
*/
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_RELOC
+
+/*
+ * System information exported through crash notes.
+ *
+ * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_INFO
+ * note in case of a system crash. This note will contain various
+ * information about the system, see xen/include/xen/elfcore.h.
+ */
+#define XEN_ELFNOTE_CRASH_INFO 0x1000001
+
+/*
+ * System registers exported through crash notes.
+ *
+ * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_REGS
+ * note per cpu in case of a system crash. This note is architecture
+ * specific and will contain registers not saved in the "CORE" note.
+ * See xen/include/xen/elfcore.h for more information.
+ */
+#define XEN_ELFNOTE_CRASH_REGS 0x1000002
+
+
+/*
+ * xen dump-core none note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_NONE
+ * in its dump file to indicate that the file is xen dump-core
+ * file. This note doesn't have any other information.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_NONE 0x2000000
+
+/*
+ * xen dump-core header note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_HEADER
+ * in its dump file.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_HEADER 0x2000001
+
+/*
+ * xen dump-core xen version note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_XEN_VERSION
+ * in its dump file. It contains the xen version obtained via the
+ * XENVER hypercall.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_XEN_VERSION 0x2000002
+
+/*
+ * xen dump-core format version note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION
+ * in its dump file. It contains a format version identifier.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION 0x2000003
#endif /* __XEN_PUBLIC_ELFNOTE_H__ */
--
2.44.0
The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to a new header so they can be re-used.
Signed-off-by: Jason Andryuk <[email protected]>
---
checkpatch.pl gives an error: "ERROR: Macros with multiple statements
should be enclosed in a do - while loop" about the moved PMDS macro.
But PMDS is an assembler macro, so its not applicable.
---
arch/x86/kernel/head_64.S | 22 ++--------------------
arch/x86/kernel/pgtable_64_helpers.h | 28 ++++++++++++++++++++++++++++
2 files changed, 30 insertions(+), 20 deletions(-)
create mode 100644 arch/x86/kernel/pgtable_64_helpers.h
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d03efb4..4b036f3220f2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -27,17 +27,12 @@
#include <asm/fixmap.h>
#include <asm/smp.h>
+#include "pgtable_64_helpers.h"
+
/*
* We are not able to switch in one step to the final KERNEL ADDRESS SPACE
* because we need identity-mapped pages.
*/
-#define l4_index(x) (((x) >> 39) & 511)
-#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
-
-L4_PAGE_OFFSET = l4_index(__PAGE_OFFSET_BASE_L4)
-L4_START_KERNEL = l4_index(__START_KERNEL_map)
-
-L3_START_KERNEL = pud_index(__START_KERNEL_map)
.text
__HEAD
@@ -619,9 +614,6 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb)
SYM_CODE_END(vc_no_ghcb)
#endif
-#define SYM_DATA_START_PAGE_ALIGNED(name) \
- SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
-
#ifdef CONFIG_PAGE_TABLE_ISOLATION
/*
* Each PGD needs to be 8k long and 8k aligned. We do not
@@ -643,14 +635,6 @@ SYM_CODE_END(vc_no_ghcb)
#define PTI_USER_PGD_FILL 0
#endif
-/* Automate the creation of 1 to 1 mapping pmd entries */
-#define PMDS(START, PERM, COUNT) \
- i = 0 ; \
- .rept (COUNT) ; \
- .quad (START) + (i << PMD_SHIFT) + (PERM) ; \
- i = i + 1 ; \
- .endr
-
__INITDATA
.balign 4
@@ -749,8 +733,6 @@ SYM_DATA_START_PAGE_ALIGNED(level1_fixmap_pgt)
.endr
SYM_DATA_END(level1_fixmap_pgt)
-#undef PMDS
-
.data
.align 16
diff --git a/arch/x86/kernel/pgtable_64_helpers.h b/arch/x86/kernel/pgtable_64_helpers.h
new file mode 100644
index 000000000000..0ae87d768ce2
--- /dev/null
+++ b/arch/x86/kernel/pgtable_64_helpers.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PGTABLES_64_H__
+#define __PGTABLES_64_H__
+
+#ifdef __ASSEMBLY__
+
+#define l4_index(x) (((x) >> 39) & 511)
+#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+
+L4_PAGE_OFFSET = l4_index(__PAGE_OFFSET_BASE_L4)
+L4_START_KERNEL = l4_index(__START_KERNEL_map)
+
+L3_START_KERNEL = pud_index(__START_KERNEL_map)
+
+#define SYM_DATA_START_PAGE_ALIGNED(name) \
+ SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
+
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT) \
+ i = 0 ; \
+ .rept (COUNT) ; \
+ .quad (START) + (i << PMD_SHIFT) + (PERM) ; \
+ i = i + 1 ; \
+ .endr
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __PGTABLES_64_H__ */
--
2.44.0
On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk <[email protected]> wrote:
>
> The PVH entrypoint is 32bit non-PIC code running the uncompressed
> vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000
> (16MB). The kernel is loaded at that physical address inside the VM by
> the VMM software (Xen/QEMU).
>
> When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
> into the PVH container. There exist system firmwares (Coreboot/EDK2)
> with reserved memory at 16MB. This creates a conflict where the PVH
> kernel cannot be loaded at that address.
>
> Modify the PVH entrypoint to be position-indepedent to allow flexibility
> in load address. Only the 64bit entry path is converted. A 32bit
> kernel is not PIC, so calling into other parts of the kernel, like
> xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
> relocated.
>
> This makes the code PIC, but the page tables need to be updated as well
> to handle running from the kernel high map.
>
> The UNWIND_HINT_END_OF_STACK is to silence:
> vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
> after the lret into 64bit code.
>
> Signed-off-by: Jason Andryuk <[email protected]>
> ---
> ---
> arch/x86/platform/pvh/head.S | 44 ++++++++++++++++++++++++++++--------
> 1 file changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index f7235ef87bc3..bb1e582e32b1 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -7,6 +7,7 @@
> .code32
> .text
> #define _pa(x) ((x) - __START_KERNEL_map)
> +#define rva(x) ((x) - pvh_start_xen)
>
> #include <linux/elfnote.h>
> #include <linux/init.h>
> @@ -54,7 +55,25 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> UNWIND_HINT_END_OF_STACK
> cld
>
> - lgdt (_pa(gdt))
> + /*
> + * See the comment for startup_32 for more details. We need to
> + * execute a call to get the execution address to be position
> + * independent, but we don't have a stack. Save and restore the
> + * magic field of start_info in ebx, and use that as the stack.
> + */
> + mov (%ebx), %eax
> + leal 4(%ebx), %esp
> + ANNOTATE_INTRA_FUNCTION_CALL
> + call 1f
> +1: popl %ebp
> + mov %eax, (%ebx)
> + subl $rva(1b), %ebp
> + movl $0, %esp
> +
> + leal rva(gdt)(%ebp), %eax
> + leal rva(gdt_start)(%ebp), %ecx
> + movl %ecx, 2(%eax)
> + lgdt (%eax)
>
> mov $PVH_DS_SEL,%eax
> mov %eax,%ds
> @@ -62,14 +81,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> mov %eax,%ss
>
> /* Stash hvm_start_info. */
> - mov $_pa(pvh_start_info), %edi
> + leal rva(pvh_start_info)(%ebp), %edi
> mov %ebx, %esi
> - mov _pa(pvh_start_info_sz), %ecx
> + movl rva(pvh_start_info_sz)(%ebp), %ecx
> shr $2,%ecx
> rep
> movsl
>
> - mov $_pa(early_stack_end), %esp
> + leal rva(early_stack_end)(%ebp), %esp
>
> /* Enable PAE mode. */
> mov %cr4, %eax
> @@ -84,28 +103,33 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> wrmsr
>
> /* Enable pre-constructed page tables. */
> - mov $_pa(init_top_pgt), %eax
> + leal rva(init_top_pgt)(%ebp), %eax
> mov %eax, %cr3
> mov $(X86_CR0_PG | X86_CR0_PE), %eax
> mov %eax, %cr0
>
> /* Jump to 64-bit mode. */
> - ljmp $PVH_CS_SEL, $_pa(1f)
> + pushl $PVH_CS_SEL
> + leal rva(1f)(%ebp), %eax
> + pushl %eax
> + lretl
>
> /* 64-bit entry point. */
> .code64
> 1:
> + UNWIND_HINT_END_OF_STACK
> +
> /* Set base address in stack canary descriptor. */
> mov $MSR_GS_BASE,%ecx
> - mov $_pa(canary), %eax
> + leal rva(canary)(%ebp), %eax
Since this is in 64-bit mode, RIP-relative addressing can be used.
> xor %edx, %edx
> wrmsr
>
> call xen_prepare_pvh
>
> /* startup_64 expects boot_params in %rsi. */
> - mov $_pa(pvh_bootparams), %rsi
> - mov $_pa(startup_64), %rax
> + lea rva(pvh_bootparams)(%ebp), %rsi
> + lea rva(startup_64)(%ebp), %rax
RIP-relative here too.
> ANNOTATE_RETPOLINE_SAFE
> jmp *%rax
>
> @@ -143,7 +167,7 @@ SYM_CODE_END(pvh_start_xen)
> .balign 8
> SYM_DATA_START_LOCAL(gdt)
> .word gdt_end - gdt_start
> - .long _pa(gdt_start)
> + .long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
> .word 0
> SYM_DATA_END(gdt)
> SYM_DATA_START_LOCAL(gdt_start)
> --
> 2.44.0
>
>
Brian Gerst
On 2024-04-10 17:00, Brian Gerst wrote:
> On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk <[email protected]> wrote:
>> /* 64-bit entry point. */
>> .code64
>> 1:
>> + UNWIND_HINT_END_OF_STACK
>> +
>> /* Set base address in stack canary descriptor. */
>> mov $MSR_GS_BASE,%ecx
>> - mov $_pa(canary), %eax
>> + leal rva(canary)(%ebp), %eax
>
> Since this is in 64-bit mode, RIP-relative addressing can be used.
>
>> xor %edx, %edx
>> wrmsr
>>
>> call xen_prepare_pvh
>>
>> /* startup_64 expects boot_params in %rsi. */
>> - mov $_pa(pvh_bootparams), %rsi
>> - mov $_pa(startup_64), %rax
>> + lea rva(pvh_bootparams)(%ebp), %rsi
>> + lea rva(startup_64)(%ebp), %rax
>
> RIP-relative here too.
Yes, thanks for catching that. With the RIP-relative conversion, there
is now:
vmlinux.o: warning: objtool: pvh_start_xen+0x10d: relocation to !ENDBR:
startup_64+0x0
I guess RIP-relative made it visible. That can be quieted by adding
ANNOTATE_NOENDBR to startup_64.
Thanks,
Jason
On Thu, Apr 11, 2024 at 11:26 AM Jason Andryuk <[email protected]> wrote:
>
> On 2024-04-10 17:00, Brian Gerst wrote:
> > On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk <[email protected]> wrote:
>
> >> /* 64-bit entry point. */
> >> .code64
> >> 1:
> >> + UNWIND_HINT_END_OF_STACK
> >> +
> >> /* Set base address in stack canary descriptor. */
> >> mov $MSR_GS_BASE,%ecx
> >> - mov $_pa(canary), %eax
> >> + leal rva(canary)(%ebp), %eax
> >
> > Since this is in 64-bit mode, RIP-relative addressing can be used.
> >
> >> xor %edx, %edx
> >> wrmsr
> >>
> >> call xen_prepare_pvh
> >>
> >> /* startup_64 expects boot_params in %rsi. */
> >> - mov $_pa(pvh_bootparams), %rsi
> >> - mov $_pa(startup_64), %rax
> >> + lea rva(pvh_bootparams)(%ebp), %rsi
> >> + lea rva(startup_64)(%ebp), %rax
> >
> > RIP-relative here too.
>
> Yes, thanks for catching that. With the RIP-relative conversion, there
> is now:
> vmlinux.o: warning: objtool: pvh_start_xen+0x10d: relocation to !ENDBR:
> startup_64+0x0
>
> I guess RIP-relative made it visible. That can be quieted by adding
> ANNOTATE_NOENDBR to startup_64.
Change it to a direct jump, since branches are always RIP-relative.
Brian Gerst
On 10.04.24 21:48, Jason Andryuk wrote:
> Sync Xen's elfnote.h header from xen.git to pull in the
> XEN_ELFNOTE_PHYS32_RELOC define.
>
> xen commit dfc9fab00378 ("x86/PVH: Support relocatable dom0 kernels")
>
> This is a copy except for the removal of the emacs editor config at the
> end of the file.
>
> Signed-off-by: Jason Andryuk <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Juergen
On 10.04.24 21:48, Jason Andryuk wrote:
> phys_base needs to be set for __pa() to work in xen_pvh_init() when
> finding the hypercall page. Set it before calling into
> xen_prepare_pvh(), which calls xen_pvh_init(). Clear it afterward to
> avoid __startup_64() adding to it and creating an incorrect value.
>
> Signed-off-by: Jason Andryuk <[email protected]>
> ---
> Instead of setting and clearing phys_base, a dedicated variable could be
> used just for the hypercall page. Having phys_base set properly may
> avoid further issues if the use of phys_base or __pa() grows.
> ---
> arch/x86/platform/pvh/head.S | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index bb1e582e32b1..c08d08d8cc92 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -125,7 +125,17 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> xor %edx, %edx
> wrmsr
>
> + /* Calculate load offset from LOAD_PHYSICAL_ADDR and store in
> + * phys_base. __pa() needs phys_base set to calculate the
> + * hypercall page in xen_pvh_init(). */
Please use the correct style for multi-line comments:
/*
* comment lines
* comment lines
*/
> + movq %rbp, %rbx
> + subq $LOAD_PHYSICAL_ADDR, %rbx
> + movq %rbx, phys_base(%rip)
> call xen_prepare_pvh
> + /* Clear phys_base. __startup_64 will *add* to its value,
> + * so reset to 0. */
Comment style again.
> + xor %rbx, %rbx
> + movq %rbx, phys_base(%rip)
>
> /* startup_64 expects boot_params in %rsi. */
> lea rva(pvh_bootparams)(%ebp), %rsi
With above fixed:
Reviewed-by: Juergen Gross <[email protected]>
Juergen
On 10.04.24 21:48, Jason Andryuk wrote:
> The PVH entry point will need an additional set of prebuild page tables.
> Move the macros and defines to a new header so they can be re-used.
>
> Signed-off-by: Jason Andryuk <[email protected]>
With the one nit below addressed:
Reviewed-by: Juergen Gross <[email protected]>
...
> diff --git a/arch/x86/kernel/pgtable_64_helpers.h b/arch/x86/kernel/pgtable_64_helpers.h
> new file mode 100644
> index 000000000000..0ae87d768ce2
> --- /dev/null
> +++ b/arch/x86/kernel/pgtable_64_helpers.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PGTABLES_64_H__
> +#define __PGTABLES_64_H__
> +
> +#ifdef __ASSEMBLY__
> +
> +#define l4_index(x) (((x) >> 39) & 511)
> +#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
Please fix the minor style issue in this line by s/-/ - /
Juergen
On 10.04.24 21:48, Jason Andryuk wrote:
> The PVH entry point is 32bit. For a 64bit kernel, the entry point must
> switch to 64bit mode, which requires a set of page tables. In the past,
> PVH used init_top_pgt.
>
> This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
> page tables are prebuilt for this address. If the kernel is loaded at a
> different address, they need to be adjusted.
>
> __startup_64() adjusts the prebuilt page tables for the physical load
> address, but it is 64bit code. The 32bit PVH entry code can't call it
> to adjust the page tables, so it can't readily be re-used.
>
> 64bit PVH entry needs page tables set up for identity map, the kernel
> high map and the direct map. pvh_start_xen() enters identity mapped.
> Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
> into the highmap. The direct map is used for __va() on the initramfs
> and other guest physical addresses.
>
> Add a dedicated set of prebuild page tables for PVH entry. They are
> adjusted in assembly before loading.
>
> Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
> along with the kernel's loading constraints. The maximum load address,
> KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
> page. It could be larger with more pages.
>
> Signed-off-by: Jason Andryuk <[email protected]>
> ---
> Instead of adding 5 pages of prebuilt page tables, they could be
> contructed dynamically in the .bss area. They are then only used for
> PVH entry and until transitioning to init_top_pgt. The .bss is later
> cleared. It's safer to add the dedicated pages, so that is done here.
> ---
> arch/x86/platform/pvh/head.S | 105 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 104 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index c08d08d8cc92..4af3cfbcf2f8 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -21,6 +21,8 @@
> #include <asm/nospec-branch.h>
> #include <xen/interface/elfnote.h>
>
> +#include "../kernel/pgtable_64_helpers.h"
> +
> __HEAD
>
> /*
> @@ -102,8 +104,47 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
> btsl $_EFER_LME, %eax
> wrmsr
>
> + mov %ebp, %ebx
> + subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */
> + jz .Lpagetable_done
> +
> + /* Fixup page-tables for relocation. */
> + leal rva(pvh_init_top_pgt)(%ebp), %edi
> + movl $512, %ecx
Please use PTRS_PER_PGD instead of the literal 512. Similar issue below.
> +2:
> + testl $_PAGE_PRESENT, 0x00(%edi)
> + jz 1f
> + addl %ebx, 0x00(%edi)
> +1:
> + addl $8, %edi
> + decl %ecx
> + jnz 2b
> +
> + /* L3 ident has a single entry. */
> + leal rva(pvh_level3_ident_pgt)(%ebp), %edi
> + addl %ebx, 0x00(%edi)
> +
> + leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
> + addl %ebx, (4096 - 16)(%edi)
> + addl %ebx, (4096 - 8)(%edi)
PAGE_SIZE instead of 4096, please.
> +
> + /* pvh_level2_ident_pgt is fine - large pages */
> +
> + /* pvh_level2_kernel_pgt needs adjustment - large pages */
> + leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
> + movl $512, %ecx
> +2:
> + testl $_PAGE_PRESENT, 0x00(%edi)
> + jz 1f
> + addl %ebx, 0x00(%edi)
> +1:
> + addl $8, %edi
> + decl %ecx
> + jnz 2b
> +
> +.Lpagetable_done:
> /* Enable pre-constructed page tables. */
> - leal rva(init_top_pgt)(%ebp), %eax
> + leal rva(pvh_init_top_pgt)(%ebp), %eax
> mov %eax, %cr3
> mov $(X86_CR0_PG | X86_CR0_PE), %eax
> mov %eax, %cr0
> @@ -197,5 +238,67 @@ SYM_DATA_START_LOCAL(early_stack)
> .fill BOOT_STACK_SIZE, 1, 0
> SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
>
> +#ifdef CONFIG_X86_64
> +/*
> + * Xen PVH needs a set of identity mapped and kernel high mapping
> + * page tables. pvh_start_xen starts running on the identity mapped
> + * page tables, but xen_prepare_pvh calls into the high mapping.
> + * These page tables need to be relocatable and are only used until
> + * startup_64 transitions to init_top_pgt.
> + */
> +SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
> + .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> + .org pvh_init_top_pgt + L4_PAGE_OFFSET*8, 0
Please add a space before and after the '*'.
> + .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> + .org pvh_init_top_pgt + L4_START_KERNEL*8, 0
> + /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
> + .quad pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
> +SYM_DATA_END(pvh_init_top_pgt)
> +
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
> + .quad pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> + .fill 511, 8, 0
> +SYM_DATA_END(pvh_level3_ident_pgt)
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
> + /*
> + * Since I easily can, map the first 1G.
> + * Don't set NX because code runs from these pages.
> + *
> + * Note: This sets _PAGE_GLOBAL despite whether
> + * the CPU supports it or it is enabled. But,
> + * the CPU should ignore the bit.
> + */
> + PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
> +SYM_DATA_END(pvh_level2_ident_pgt)
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt)
> + .fill L3_START_KERNEL,8,0
Spaces after the commas.
> + /* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
> + .quad pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> + .quad 0 /* no fixmap */
> +SYM_DATA_END(pvh_level3_kernel_pgt)
> +
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt)
> + /*
> + * Kernel high mapping.
> + *
> + * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
> + * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
> + * 512 MiB otherwise.
> + *
> + * (NOTE: after that starts the module area, see MODULES_VADDR.)
> + *
> + * This table is eventually used by the kernel during normal runtime.
> + * Care must be taken to clear out undesired bits later, like _PAGE_RW
> + * or _PAGE_GLOBAL in some cases.
> + */
> + PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
Spaces around '/'.
> +SYM_DATA_END(pvh_level2_kernel_pgt)
> +
> + ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_RELOC,
> + .long CONFIG_PHYSICAL_ALIGN;
> + .long LOAD_PHYSICAL_ADDR;
> + .long KERNEL_IMAGE_SIZE - 1)
> +#endif
> +
> ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
> _ASM_PTR (pvh_start_xen - __START_KERNEL_map))
Juergen
On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> ---
> arch/x86/kernel/head_64.S | 22 ++--------------------
> arch/x86/kernel/pgtable_64_helpers.h | 28 ++++++++++++++++++++++++++++
That's the wrong place as you want to include it from arch/x86/platform.
arch/x86/include/asm/....
Thanks,
tglx
On Thu, May 23, 2024 at 03:59:43PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> > ---
> > arch/x86/kernel/head_64.S | 22 ++--------------------
> > arch/x86/kernel/pgtable_64_helpers.h | 28 ++++++++++++++++++++++++++++
>
> That's the wrong place as you want to include it from arch/x86/platform.
>
> arch/x86/include/asm/....
.. and there already is a header waiting:
arch/x86/include/asm/pgtable_64.h
so no need for a new one.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette