The physical memory range that kexec selects for the compressed
bzimage target kernel, might not be where it runs from. The
startup_64() code in head_64.S copies itself out of the way
before the decompression so it doesn't clobber itself.
If the start of the memory range selected by kexec is above
LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains
within the memory area. But if the start is below this range,
then the copy will likely end up outside the range.
Usually, this will be harmless because not much memory is in use
at the time of the pre-decompression copy, so there is little
to accidentally clobber. However, an unlucky choice for the
adress of the kernel and the initrd could put the initrd in harm's
way. For example:
0x00400000 - physical address for target kernel
0x03ff8000 - physical address of seven-page initrd
0x0302c000 - size of uncompressed kernel (about 50 Mbytes)
The decompressed kernel will span 0x01000000 through 0x0402c000,
which will overwrite the initrd.
If the kexec code restricts itself to physical addresses above
0x01000000, then the pre-decompression copy and the decompression
itself will stay within the bounds of the memory kexec selected
(unless a non-default value is used in the target kernel for
CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR,
but that's probably unsolvable unless the target kernel were to
somehow communicate this to kexec).
Signed-off-by: John Sperbeck <[email protected]>
---
arch/x86/kernel/kexec-bzimage64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index a61c12c01270..d6bf6c13dab1 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -36,7 +36,7 @@
*/
#define MIN_PURGATORY_ADDR 0x3000
#define MIN_BOOTPARAM_ADDR 0x3000
-#define MIN_KERNEL_LOAD_ADDR 0x100000
+#define MIN_KERNEL_LOAD_ADDR 0x1000000
#define MIN_INITRD_LOAD_ADDR 0x1000000
/*
--
2.42.0.655.g421f12c284-goog
On October 22, 2023 7:31:21 PM PDT, John Sperbeck <[email protected]> wrote:
>The physical memory range that kexec selects for the compressed
>bzimage target kernel, might not be where it runs from. The
>startup_64() code in head_64.S copies itself out of the way
>before the decompression so it doesn't clobber itself.
>
>If the start of the memory range selected by kexec is above
>LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains
>within the memory area. But if the start is below this range,
>then the copy will likely end up outside the range.
>
>Usually, this will be harmless because not much memory is in use
>at the time of the pre-decompression copy, so there is little
>to accidentally clobber. However, an unlucky choice for the
>adress of the kernel and the initrd could put the initrd in harm's
>way. For example:
>
> 0x00400000 - physical address for target kernel
> 0x03ff8000 - physical address of seven-page initrd
> 0x0302c000 - size of uncompressed kernel (about 50 Mbytes)
>
>The decompressed kernel will span 0x01000000 through 0x0402c000,
>which will overwrite the initrd.
>
>If the kexec code restricts itself to physical addresses above
>0x01000000, then the pre-decompression copy and the decompression
>itself will stay within the bounds of the memory kexec selected
>(unless a non-default value is used in the target kernel for
>CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR,
>but that's probably unsolvable unless the target kernel were to
>somehow communicate this to kexec).
>
>Signed-off-by: John Sperbeck <[email protected]>
>---
> arch/x86/kernel/kexec-bzimage64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
>index a61c12c01270..d6bf6c13dab1 100644
>--- a/arch/x86/kernel/kexec-bzimage64.c
>+++ b/arch/x86/kernel/kexec-bzimage64.c
>@@ -36,7 +36,7 @@
> */
> #define MIN_PURGATORY_ADDR 0x3000
> #define MIN_BOOTPARAM_ADDR 0x3000
>-#define MIN_KERNEL_LOAD_ADDR 0x100000
>+#define MIN_KERNEL_LOAD_ADDR 0x1000000
> #define MIN_INITRD_LOAD_ADDR 0x1000000
>
> /*
This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly.
On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin <[email protected]> wrote:
>
> On October 22, 2023 7:31:21 PM PDT, John Sperbeck <[email protected]> wrote:
> >The physical memory range that kexec selects for the compressed
> >bzimage target kernel, might not be where it runs from. The
> >startup_64() code in head_64.S copies itself out of the way
> >before the decompression so it doesn't clobber itself.
> >
> >If the start of the memory range selected by kexec is above
> >LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains
> >within the memory area. But if the start is below this range,
> >then the copy will likely end up outside the range.
> >
> >Usually, this will be harmless because not much memory is in use
> >at the time of the pre-decompression copy, so there is little
> >to accidentally clobber. However, an unlucky choice for the
> >adress of the kernel and the initrd could put the initrd in harm's
> >way. For example:
> >
> > 0x00400000 - physical address for target kernel
> > 0x03ff8000 - physical address of seven-page initrd
> > 0x0302c000 - size of uncompressed kernel (about 50 Mbytes)
> >
> >The decompressed kernel will span 0x01000000 through 0x0402c000,
> >which will overwrite the initrd.
> >
> >If the kexec code restricts itself to physical addresses above
> >0x01000000, then the pre-decompression copy and the decompression
> >itself will stay within the bounds of the memory kexec selected
> >(unless a non-default value is used in the target kernel for
> >CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR,
> >but that's probably unsolvable unless the target kernel were to
> >somehow communicate this to kexec).
> >
> >Signed-off-by: John Sperbeck <[email protected]>
> >---
> > arch/x86/kernel/kexec-bzimage64.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
> >index a61c12c01270..d6bf6c13dab1 100644
> >--- a/arch/x86/kernel/kexec-bzimage64.c
> >+++ b/arch/x86/kernel/kexec-bzimage64.c
> >@@ -36,7 +36,7 @@
> > */
> > #define MIN_PURGATORY_ADDR 0x3000
> > #define MIN_BOOTPARAM_ADDR 0x3000
> >-#define MIN_KERNEL_LOAD_ADDR 0x100000
> >+#define MIN_KERNEL_LOAD_ADDR 0x1000000
> > #define MIN_INITRD_LOAD_ADDR 0x1000000
> >
> > /*
>
> This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly.
The startup_64() code in head_64.S changes behavior based on whether
it's running below or above LOAD_PHYSICAL_ADDR:
#ifdef CONFIG_RELOCATABLE
leaq startup_32(%rip) /* - $startup_32 */, %rbp
movl BP_kernel_alignment(%rsi), %eax
decl %eax
addq %rax, %rbp
notq %rax
andq %rax, %rbp
cmpq $LOAD_PHYSICAL_ADDR, %rbp
jae 1f
#endif
movq $LOAD_PHYSICAL_ADDR, %rbp
1:
In my example, we were running from address 0x00400000. The %rbp
register will start with 0x00400000, but will be changed to 0x01000000
after the check against LOAD_PHYSICAL_ADDR fails.
The 0x01000000 value in %rbp is passed to extract_kernel as the
'output' argument. Unless choose_random_location() decides
differently, this will be where the kernel is decompressed to. The
size of the kernel is large enough in my example that the
decompression overruns the initrd.
If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and
used %rpb as is, then there would be no issue. The decompression
would have been to 0x00400000 and would have completed before reaching
the initrd memory.
That is, the kexec code is being careful to ensure that the kernel and
initrd memory doesn't overlap, but isn't paying attention to what
happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel
address is effectively changed to a different location). My proposed
change makes it aware, and avoids such addresses.