From: Mike Rapoport <[email protected]>
Hi,
Commit a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
changed the way early memory reservations are made and caused a regression
for users that set CONFIG_X86_RESERVE_LOW to 640K in their kernel
configuration [1] because there was no room for the real mode trampoline.
My initial suggestion was to reduce the limit of CONFIG_X86_RESERVE_LOW
from 640K to 512K [2], but in the end it seems simpler to always reserve
the first 1M of RAM after the real mode trampoline is allocated.
The first patch in the series contains the rework of early memory
reservations so that first 64K will be reserved very early before memblock
allocations are possible and the remaining memory under 1M would be
reserved after the real mode trampoline is allocated. This patch also
update freeing of EFI boot services so that memory under 1M will remain
reserved which is also required for crash kernel [3].
The second and the third patches are cleanups that remove pieces that are
not longer required after the first patch is applied.
Randy, Hugh, I'd appreciate if you give this a whirl on your old Sandy
Bridge laptops as it changes again the way trim_snb_memory() works.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=213177
[2] https://lore.kernel.org/lkml/[email protected]
[3] https://lore.kernel.org/lkml/[email protected]/#r
Mike Rapoport (3):
x86/setup: always reserve the first 1M of RAM
x86/setup: remove CONFIG_X86_RESERVE_LOW and reservelow options
x86/crash: remove crash_reserve_low_1M()
.../admin-guide/kernel-parameters.txt | 5 --
arch/x86/Kconfig | 29 ---------
arch/x86/include/asm/crash.h | 6 --
arch/x86/kernel/crash.c | 13 ----
arch/x86/kernel/setup.c | 59 +++++++------------
arch/x86/platform/efi/quirks.c | 12 ++++
arch/x86/realmode/init.c | 14 +++--
7 files changed, 41 insertions(+), 97 deletions(-)
base-commit: c4681547bcce777daf576925a966ffa824edd09d
--
2.28.0
From: Mike Rapoport <[email protected]>
There are BIOSes that are known to corrupt the memory under 1M, or more
precisely under 640K because the memory above 640K is anyway reserved for
the EGA/VGA frame buffer and BIOS.
To prevent usage of the memory that will be potentially clobbered by the
kernel, the beginning of the memory is always reserved. The exact size of
the reserved area is determined by CONFIG_X86_RESERVE_LOW build time and
reservelow command line option. The reserved range may be from 4K to 640K
with the default of 64K. There are also configurations that reserve the
entire 1M range, like machines with SandyBridge graphic devices or systems
that enable crash kernel.
In addition to the potentially clobbered memory, EBDA of unknown size may
be as low as 128K and the memory above that EBDA start is also reserved
early.
It would have been possible to reserve the entire range under 1M unless for
the real mode trampoline that must reside in that area.
To accommodate placement of the real mode trampoline and keep the memory
safe from being clobbered by BIOS reserve the first 64K of RAM before
memory allocations are possible and then, after the real mode trampoline is
allocated, reserve the entire range from 0 to 1M.
Update trim_snb_memory() and reserve_real_mode() to avoid redundant
reservations of the same memory range.
Also make sure the memory under 1M is not getting freed by
efi_free_boot_services().
Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
Signed-off-by: Mike Rapoport <[email protected]>
---
arch/x86/kernel/setup.c | 35 ++++++++++++++++++++--------------
arch/x86/platform/efi/quirks.c | 12 ++++++++++++
arch/x86/realmode/init.c | 14 ++++++++------
3 files changed, 41 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 72920af0b3c0..22e9a17d6ac3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -637,11 +637,11 @@ static void __init trim_snb_memory(void)
* them from accessing certain memory ranges, namely anything below
* 1M and in the pages listed in bad_pages[] above.
*
- * To avoid these pages being ever accessed by SNB gfx devices
- * reserve all memory below the 1 MB mark and bad_pages that have
- * not already been reserved at boot time.
+ * To avoid these pages being ever accessed by SNB gfx devices reserve
+ * bad_pages that have not already been reserved at boot time.
+ * All memory below the 1 MB mark is anyway reserved later during
+ * setup_arch(), so there is no need to reserve it here.
*/
- memblock_reserve(0, 1<<20);
for (i = 0; i < ARRAY_SIZE(bad_pages); i++) {
if (memblock_reserve(bad_pages[i], PAGE_SIZE))
@@ -733,14 +733,14 @@ static void __init early_reserve_memory(void)
* The first 4Kb of memory is a BIOS owned area, but generally it is
* not listed as such in the E820 table.
*
- * Reserve the first memory page and typically some additional
- * memory (64KiB by default) since some BIOSes are known to corrupt
- * low memory. See the Kconfig help text for X86_RESERVE_LOW.
+ * Reserve the first 64K of memory since some BIOSes are known to
+ * corrupt low memory. After the real mode trampoline is allocated the
+ * rest of the memory below 640k is reserved.
*
* In addition, make sure page 0 is always reserved because on
* systems with L1TF its contents can be leaked to user processes.
*/
- memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
+ memblock_reserve(0, SZ_64K);
early_reserve_initrd();
@@ -751,6 +751,7 @@ static void __init early_reserve_memory(void)
reserve_ibft_region();
reserve_bios_regions();
+ trim_snb_memory();
}
/*
@@ -1081,14 +1082,20 @@ void __init setup_arch(char **cmdline_p)
(max_pfn_mapped<<PAGE_SHIFT) - 1);
#endif
- reserve_real_mode();
-
/*
- * Reserving memory causing GPU hangs on Sandy Bridge integrated
- * graphics devices should be done after we allocated memory under
- * 1M for the real mode trampoline.
+ * Find free memory for the real mode trampoline and place it
+ * there.
+ * If there is not enough free memory under 1M, on EFI-enabled
+ * systems there will be additional attempt to reclaim the memory
+ * for the real mode trampoline at efi_free_boot_services().
+ *
+ * Unconditionally reserve the entire first 1M of RAM because
+ * BIOSes are know to corrupt low memory and several
+ * hundred kilobytes are not worth complex detection what memory gets
+ * clobbered. Moreover, on machines with SandyBridge graphics or in
+ * setups that use crashkernel the entire 1M is anyway reserved.
*/
- trim_snb_memory();
+ reserve_real_mode();
init_mem_mapping();
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 7850111008a8..b15ebfe40a73 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -450,6 +450,18 @@ void __init efi_free_boot_services(void)
size -= rm_size;
}
+ /*
+ * Don't free memory under 1M for two reasons:
+ * - BIOS might clobber it
+ * - Crash kernel needs it to be reserved
+ */
+ if (start + size < SZ_1M)
+ continue;
+ if (start < SZ_1M) {
+ size -= (SZ_1M - start);
+ start = SZ_1M;
+ }
+
memblock_free_late(start, size);
}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 2e1c1bec0f9e..8ea285aca827 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -29,14 +29,16 @@ void __init reserve_real_mode(void)
/* Has to be under 1M so we can execute real-mode AP code. */
mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
- if (!mem) {
+ if (!mem)
pr_info("No sub-1M memory is available for the trampoline\n");
- return;
- }
+ else
+ set_real_mode_mem(mem);
- memblock_reserve(mem, size);
- set_real_mode_mem(mem);
- crash_reserve_low_1M();
+ /*
+ * Unconditionally reserve the entire fisrt 1M, see comment in
+ * setup_arch()
+ */
+ memblock_reserve(0, SZ_1M);
}
static void sme_sev_setup_real_mode(struct trampoline_header *th)
--
2.28.0
From: Mike Rapoport <[email protected]>
The entire memory range under 1M is unconditionally reserved at
setup_arch(), so there is no need for crash_reserve_low_1M() anymore.
Remove this function.
Signed-off-by: Mike Rapoport <[email protected]>
---
arch/x86/include/asm/crash.h | 6 ------
arch/x86/kernel/crash.c | 13 -------------
2 files changed, 19 deletions(-)
diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index f58de66091e5..8b6bd63530dc 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -9,10 +9,4 @@ int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
void crash_smp_send_stop(void);
-#ifdef CONFIG_KEXEC_CORE
-void __init crash_reserve_low_1M(void);
-#else
-static inline void __init crash_reserve_low_1M(void) { }
-#endif
-
#endif /* _ASM_X86_CRASH_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 54ce999ed321..e8326a8d1c5d 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -70,19 +70,6 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
rcu_read_unlock();
}
-/*
- * When the crashkernel option is specified, only use the low
- * 1M for the real mode trampoline.
- */
-void __init crash_reserve_low_1M(void)
-{
- if (cmdline_find_option(boot_command_line, "crashkernel", NULL, 0) < 0)
- return;
-
- memblock_reserve(0, 1<<20);
- pr_info("Reserving the low 1M of memory for crashkernel\n");
-}
-
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
--
2.28.0
From: Mike Rapoport <[email protected]>
The CONFIG_X86_RESERVE_LOW build time and reservelow command line option
allowed to control the amount of memory under 1M that would be reserved at
boot to avoid using memory that can be potentially clobbered by BIOS.
Since the entire range under 1M is always reserved there is no need for
these options and they can be removed.
Signed-off-by: Mike Rapoport <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 5 ----
arch/x86/Kconfig | 29 -------------------
arch/x86/kernel/setup.c | 24 ---------------
3 files changed, 58 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..d7d813032c51 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4775,11 +4775,6 @@
Reserves a hole at the top of the kernel virtual
address space.
- reservelow= [X86]
- Format: nn[K]
- Set the amount of memory to reserve for BIOS at
- the bottom of the address space.
-
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0045e1b44190..86dae426798b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1693,35 +1693,6 @@ config X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK
Set whether the default state of memory_corruption_check is
on or off.
-config X86_RESERVE_LOW
- int "Amount of low memory, in kilobytes, to reserve for the BIOS"
- default 64
- range 4 640
- help
- Specify the amount of low memory to reserve for the BIOS.
-
- The first page contains BIOS data structures that the kernel
- must not use, so that page must always be reserved.
-
- By default we reserve the first 64K of physical RAM, as a
- number of BIOSes are known to corrupt that memory range
- during events such as suspend/resume or monitor cable
- insertion, so it must not be used by the kernel.
-
- You can set this to 4 if you are absolutely sure that you
- trust the BIOS to get all its memory reservations and usages
- right. If you know your BIOS have problems beyond the
- default 64K area, you can set this to 640 to avoid using the
- entire low memory range.
-
- If you have doubts about the BIOS (e.g. suspend/resume does
- not work or there's kernel crashes after certain hardware
- hotplug events) then you might want to enable
- X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check
- typical corruption patterns.
-
- Leave this to the default value of 64 if you are unsure.
-
config MATH_EMULATION
bool
depends on MODIFY_LDT_SYSCALL
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 22e9a17d6ac3..9cf24b648c73 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -694,30 +694,6 @@ static void __init e820_add_kernel_range(void)
e820__range_add(start, size, E820_TYPE_RAM);
}
-static unsigned reserve_low = CONFIG_X86_RESERVE_LOW << 10;
-
-static int __init parse_reservelow(char *p)
-{
- unsigned long long size;
-
- if (!p)
- return -EINVAL;
-
- size = memparse(p, &p);
-
- if (size < 4096)
- size = 4096;
-
- if (size > 640*1024)
- size = 640*1024;
-
- reserve_low = size;
-
- return 0;
-}
-
-early_param("reservelow", parse_reservelow);
-
static void __init early_reserve_memory(void)
{
/*
--
2.28.0
On 06/01/21 at 10:53am, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
......
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 7850111008a8..b15ebfe40a73 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -450,6 +450,18 @@ void __init efi_free_boot_services(void)
> size -= rm_size;
> }
Thanks for taking care of the low-1M excluding in
efi_free_boot_services(), Mike. You might want to remove the old real
mode excluding code either since it's been covered by your new code.
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index b15ebfe40a73..be814f2089ff 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -409,7 +409,6 @@ void __init efi_free_boot_services(void)
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
- size_t rm_size;
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA) {
@@ -430,26 +429,6 @@ void __init efi_free_boot_services(void)
*/
efi_unmap_pages(md);
- /*
- * Nasty quirk: if all sub-1MB memory is used for boot
- * services, we can get here without having allocated the
- * real mode trampoline. It's too late to hand boot services
- * memory back to the memblock allocator, so instead
- * try to manually allocate the trampoline if needed.
- *
- * I've seen this on a Dell XPS 13 9350 with firmware
- * 1.4.4 with SGX enabled booting Linux via Fedora 24's
- * grub2-efi on a hard disk. (And no, I don't know why
- * this happened, but Linux should still try to boot rather
- * panicking early.)
- */
- rm_size = real_mode_size_needed();
- if (rm_size && (start + rm_size) < (1<<20) && size >= rm_size) {
- set_real_mode_mem(start);
- start += rm_size;
- size -= rm_size;
- }
-
/*
* Don't free memory under 1M for two reasons:
* - BIOS might clobber it
>
> + /*
> + * Don't free memory under 1M for two reasons:
> + * - BIOS might clobber it
> + * - Crash kernel needs it to be reserved
> + */
> + if (start + size < SZ_1M)
> + continue;
> + if (start < SZ_1M) {
> + size -= (SZ_1M - start);
> + start = SZ_1M;
> + }
> +
> memblock_free_late(start, size);
> }
>
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 2e1c1bec0f9e..8ea285aca827 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -29,14 +29,16 @@ void __init reserve_real_mode(void)
>
> /* Has to be under 1M so we can execute real-mode AP code. */
> mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
> - if (!mem) {
> + if (!mem)
> pr_info("No sub-1M memory is available for the trampoline\n");
> - return;
> - }
> + else
> + set_real_mode_mem(mem);
>
> - memblock_reserve(mem, size);
> - set_real_mode_mem(mem);
> - crash_reserve_low_1M();
> + /*
> + * Unconditionally reserve the entire fisrt 1M, see comment in
> + * setup_arch()
> + */
> + memblock_reserve(0, SZ_1M);
> }
>
> static void sme_sev_setup_real_mode(struct trampoline_header *th)
> --
> 2.28.0
>
Hi Baoquan,
On Tue, Jun 01, 2021 at 05:06:53PM +0800, Baoquan He wrote:
> On 06/01/21 at 10:53am, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> ......
>
> > diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> > index 7850111008a8..b15ebfe40a73 100644
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
> > @@ -450,6 +450,18 @@ void __init efi_free_boot_services(void)
> > size -= rm_size;
> > }
>
> Thanks for taking care of the low-1M excluding in
> efi_free_boot_services(), Mike. You might want to remove the old real
> mode excluding code either since it's been covered by your new code.
Unfortunately I can't because it's important that set_real_mode_mem() would
reuse memory that was occupied by EFI boot services and that is being freed
here.
According to the changelog of 5bc653b73182 ("x86/efi: Allocate a trampoline
if needed in efi_free_boot_services()"), that system has EBDA at 0x2c000 so
we reserve everything from 0x2c000 to 0xa0000 in reserve_bios_regions() and
most of the memory below 0x2c0000 is used by EFI boot data. So with such
memory layout reserve_real_mode() won't be able to allocate the trampoline.
Yet, when the EFI boot data is free, the room occupied by it will be reused
by the real mode trampoline via set_real_mode_mem().
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index b15ebfe40a73..be814f2089ff 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -409,7 +409,6 @@ void __init efi_free_boot_services(void)
> for_each_efi_memory_desc(md) {
> unsigned long long start = md->phys_addr;
> unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> - size_t rm_size;
>
> if (md->type != EFI_BOOT_SERVICES_CODE &&
> md->type != EFI_BOOT_SERVICES_DATA) {
> @@ -430,26 +429,6 @@ void __init efi_free_boot_services(void)
> */
> efi_unmap_pages(md);
>
> - /*
> - * Nasty quirk: if all sub-1MB memory is used for boot
> - * services, we can get here without having allocated the
> - * real mode trampoline. It's too late to hand boot services
> - * memory back to the memblock allocator, so instead
> - * try to manually allocate the trampoline if needed.
> - *
> - * I've seen this on a Dell XPS 13 9350 with firmware
> - * 1.4.4 with SGX enabled booting Linux via Fedora 24's
> - * grub2-efi on a hard disk. (And no, I don't know why
> - * this happened, but Linux should still try to boot rather
> - * panicking early.)
> - */
> - rm_size = real_mode_size_needed();
> - if (rm_size && (start + rm_size) < (1<<20) && size >= rm_size) {
> - set_real_mode_mem(start);
> - start += rm_size;
> - size -= rm_size;
> - }
> -
> /*
> * Don't free memory under 1M for two reasons:
> * - BIOS might clobber it
>
> >
> > + /*
> > + * Don't free memory under 1M for two reasons:
> > + * - BIOS might clobber it
> > + * - Crash kernel needs it to be reserved
> > + */
> > + if (start + size < SZ_1M)
> > + continue;
> > + if (start < SZ_1M) {
> > + size -= (SZ_1M - start);
> > + start = SZ_1M;
> > + }
> > +
> > memblock_free_late(start, size);
> > }
> >
> > diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> > index 2e1c1bec0f9e..8ea285aca827 100644
> > --- a/arch/x86/realmode/init.c
> > +++ b/arch/x86/realmode/init.c
> > @@ -29,14 +29,16 @@ void __init reserve_real_mode(void)
> >
> > /* Has to be under 1M so we can execute real-mode AP code. */
> > mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
> > - if (!mem) {
> > + if (!mem)
> > pr_info("No sub-1M memory is available for the trampoline\n");
> > - return;
> > - }
> > + else
> > + set_real_mode_mem(mem);
> >
> > - memblock_reserve(mem, size);
> > - set_real_mode_mem(mem);
> > - crash_reserve_low_1M();
> > + /*
> > + * Unconditionally reserve the entire fisrt 1M, see comment in
> > + * setup_arch()
> > + */
> > + memblock_reserve(0, SZ_1M);
> > }
> >
> > static void sme_sev_setup_real_mode(struct trampoline_header *th)
> > --
> > 2.28.0
> >
>
--
Sincerely yours,
Mike.
On Tue, 1 Jun 2021, Mike Rapoport wrote:
>
> Randy, Hugh, I'd appreciate if you give this a whirl on your old Sandy
> Bridge laptops as it changes again the way trim_snb_memory() works.
Boots and runs fine here, i386 or x86_64: thanks for remembering us!
Hugh
On Tue, Jun 01, 2021 at 10:53:52AM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> There are BIOSes that are known to corrupt the memory under 1M, or more
> precisely under 640K because the memory above 640K is anyway reserved for
> the EGA/VGA frame buffer and BIOS.
>
> To prevent usage of the memory that will be potentially clobbered by the
> kernel, the beginning of the memory is always reserved. The exact size of
> the reserved area is determined by CONFIG_X86_RESERVE_LOW build time and
> reservelow command line option. The reserved range may be from 4K to 640K
> with the default of 64K. There are also configurations that reserve the
> entire 1M range, like machines with SandyBridge graphic devices or systems
> that enable crash kernel.
>
> In addition to the potentially clobbered memory, EBDA of unknown size may
> be as low as 128K and the memory above that EBDA start is also reserved
> early.
>
> It would have been possible to reserve the entire range under 1M unless for
> the real mode trampoline that must reside in that area.
>
> To accommodate placement of the real mode trampoline and keep the memory
> safe from being clobbered by BIOS reserve the first 64K of RAM before
> memory allocations are possible and then, after the real mode trampoline is
> allocated, reserve the entire range from 0 to 1M.
>
> Update trim_snb_memory() and reserve_real_mode() to avoid redundant
> reservations of the same memory range.
>
> Also make sure the memory under 1M is not getting freed by
> efi_free_boot_services().
>
> Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
> arch/x86/kernel/setup.c | 35 ++++++++++++++++++++--------------
> arch/x86/platform/efi/quirks.c | 12 ++++++++++++
> arch/x86/realmode/init.c | 14 ++++++++------
> 3 files changed, 41 insertions(+), 20 deletions(-)
Ok, let's try it. Booting on a couple of boxes looks ok here, the
difference is visible:
- DMA zone: 30 pages reserved
+ DMA zone: 159 pages reserved
On the other box, it was already reserving so many pages even before
DMA zone: 159 pages reserved
i.e., the first 640K.
But it's not like I had problems before with early reservations so my
testing doesn't mean a whole lot. Hugh's testing sounds good, lemme add
his tag too.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: f1d4d47c5851b348b7713007e152bc68b94d728b
Gitweb: https://git.kernel.org/tip/f1d4d47c5851b348b7713007e152bc68b94d728b
Author: Mike Rapoport <[email protected]>
AuthorDate: Tue, 01 Jun 2021 10:53:52 +03:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 03 Jun 2021 19:57:55 +02:00
x86/setup: Always reserve the first 1M of RAM
There are BIOSes that are known to corrupt the memory under 1M, or more
precisely under 640K because the memory above 640K is anyway reserved
for the EGA/VGA frame buffer and BIOS.
To prevent usage of the memory that will be potentially clobbered by the
kernel, the beginning of the memory is always reserved. The exact size
of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time
and the "reservelow=" command line option. The reserved range may be
from 4K to 640K with the default of 64K. There are also configurations
that reserve the entire 1M range, like machines with SandyBridge graphic
devices or systems that enable crash kernel.
In addition to the potentially clobbered memory, EBDA of unknown size may
be as low as 128K and the memory above that EBDA start is also reserved
early.
It would have been possible to reserve the entire range under 1M unless for
the real mode trampoline that must reside in that area.
To accommodate placement of the real mode trampoline and keep the memory
safe from being clobbered by BIOS, reserve the first 64K of RAM before
memory allocations are possible and then, after the real mode trampoline
is allocated, reserve the entire range from 0 to 1M.
Update trim_snb_memory() and reserve_real_mode() to avoid redundant
reservations of the same memory range.
Also make sure the memory under 1M is not getting freed by
efi_free_boot_services().
[ bp: Massage commit message and comments. ]
Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Hugh Dickins <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213177
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/setup.c | 35 +++++++++++++++++++--------------
arch/x86/platform/efi/quirks.c | 12 +++++++++++-
arch/x86/realmode/init.c | 14 +++++++------
3 files changed, 41 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ff653d6..1e72062 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -638,11 +638,11 @@ static void __init trim_snb_memory(void)
* them from accessing certain memory ranges, namely anything below
* 1M and in the pages listed in bad_pages[] above.
*
- * To avoid these pages being ever accessed by SNB gfx devices
- * reserve all memory below the 1 MB mark and bad_pages that have
- * not already been reserved at boot time.
+ * To avoid these pages being ever accessed by SNB gfx devices reserve
+ * bad_pages that have not already been reserved at boot time.
+ * All memory below the 1 MB mark is anyway reserved later during
+ * setup_arch(), so there is no need to reserve it here.
*/
- memblock_reserve(0, 1<<20);
for (i = 0; i < ARRAY_SIZE(bad_pages); i++) {
if (memblock_reserve(bad_pages[i], PAGE_SIZE))
@@ -734,14 +734,14 @@ static void __init early_reserve_memory(void)
* The first 4Kb of memory is a BIOS owned area, but generally it is
* not listed as such in the E820 table.
*
- * Reserve the first memory page and typically some additional
- * memory (64KiB by default) since some BIOSes are known to corrupt
- * low memory. See the Kconfig help text for X86_RESERVE_LOW.
+ * Reserve the first 64K of memory since some BIOSes are known to
+ * corrupt low memory. After the real mode trampoline is allocated the
+ * rest of the memory below 640k is reserved.
*
* In addition, make sure page 0 is always reserved because on
* systems with L1TF its contents can be leaked to user processes.
*/
- memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
+ memblock_reserve(0, SZ_64K);
early_reserve_initrd();
@@ -752,6 +752,7 @@ static void __init early_reserve_memory(void)
reserve_ibft_region();
reserve_bios_regions();
+ trim_snb_memory();
}
/*
@@ -1082,14 +1083,20 @@ void __init setup_arch(char **cmdline_p)
(max_pfn_mapped<<PAGE_SHIFT) - 1);
#endif
- reserve_real_mode();
-
/*
- * Reserving memory causing GPU hangs on Sandy Bridge integrated
- * graphics devices should be done after we allocated memory under
- * 1M for the real mode trampoline.
+ * Find free memory for the real mode trampoline and place it
+ * there.
+ * If there is not enough free memory under 1M, on EFI-enabled
+ * systems there will be additional attempt to reclaim the memory
+ * for the real mode trampoline at efi_free_boot_services().
+ *
+ * Unconditionally reserve the entire first 1M of RAM because
+ * BIOSes are know to corrupt low memory and several
+ * hundred kilobytes are not worth complex detection what memory gets
+ * clobbered. Moreover, on machines with SandyBridge graphics or in
+ * setups that use crashkernel the entire 1M is reserved anyway.
*/
- trim_snb_memory();
+ reserve_real_mode();
init_mem_mapping();
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 7850111..b15ebfe 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -450,6 +450,18 @@ void __init efi_free_boot_services(void)
size -= rm_size;
}
+ /*
+ * Don't free memory under 1M for two reasons:
+ * - BIOS might clobber it
+ * - Crash kernel needs it to be reserved
+ */
+ if (start + size < SZ_1M)
+ continue;
+ if (start < SZ_1M) {
+ size -= (SZ_1M - start);
+ start = SZ_1M;
+ }
+
memblock_free_late(start, size);
}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 2e1c1be..6534c92 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -29,14 +29,16 @@ void __init reserve_real_mode(void)
/* Has to be under 1M so we can execute real-mode AP code. */
mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
- if (!mem) {
+ if (!mem)
pr_info("No sub-1M memory is available for the trampoline\n");
- return;
- }
+ else
+ set_real_mode_mem(mem);
- memblock_reserve(mem, size);
- set_real_mode_mem(mem);
- crash_reserve_low_1M();
+ /*
+ * Unconditionally reserve the entire fisrt 1M, see comment in
+ * setup_arch().
+ */
+ memblock_reserve(0, SZ_1M);
}
static void sme_sev_setup_real_mode(struct trampoline_header *th)
The following commit has been merged into the x86/cleanups branch of tip:
Commit-ID: 23721c8e92f73f9f89e7362c50c2996a5c9ad483
Gitweb: https://git.kernel.org/tip/23721c8e92f73f9f89e7362c50c2996a5c9ad483
Author: Mike Rapoport <[email protected]>
AuthorDate: Tue, 01 Jun 2021 10:53:54 +03:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 07 Jun 2021 12:14:45 +02:00
x86/crash: Remove crash_reserve_low_1M()
The entire memory range under 1M is unconditionally reserved in
setup_arch(), so there is no need for crash_reserve_low_1M() anymore.
Remove this function.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/crash.h | 6 ------
arch/x86/kernel/crash.c | 13 -------------
2 files changed, 19 deletions(-)
diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index f58de66..8b6bd63 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -9,10 +9,4 @@ int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
void crash_smp_send_stop(void);
-#ifdef CONFIG_KEXEC_CORE
-void __init crash_reserve_low_1M(void);
-#else
-static inline void __init crash_reserve_low_1M(void) { }
-#endif
-
#endif /* _ASM_X86_CRASH_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 54ce999..e8326a8 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -70,19 +70,6 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
rcu_read_unlock();
}
-/*
- * When the crashkernel option is specified, only use the low
- * 1M for the real mode trampoline.
- */
-void __init crash_reserve_low_1M(void)
-{
- if (cmdline_find_option(boot_command_line, "crashkernel", NULL, 0) < 0)
- return;
-
- memblock_reserve(0, 1<<20);
- pr_info("Reserving the low 1M of memory for crashkernel\n");
-}
-
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
The following commit has been merged into the x86/cleanups branch of tip:
Commit-ID: 1a6a9044b96729abacede172d7591c714a5b81d1
Gitweb: https://git.kernel.org/tip/1a6a9044b96729abacede172d7591c714a5b81d1
Author: Mike Rapoport <[email protected]>
AuthorDate: Tue, 01 Jun 2021 10:53:53 +03:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 07 Jun 2021 11:12:25 +02:00
x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
The CONFIG_X86_RESERVE_LOW build time and reservelow= command line option
allowed to control the amount of memory under 1M that would be reserved at
boot to avoid using memory that can be potentially clobbered by BIOS.
Since the entire range under 1M is always reserved there is no need for
these options anymore and they can be removed.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Documentation/admin-guide/kernel-parameters.txt | 5 +---
arch/x86/Kconfig | 29 +----------------
arch/x86/kernel/setup.c | 24 +-------------
3 files changed, 58 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbd..d7d8130 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4775,11 +4775,6 @@
Reserves a hole at the top of the kernel virtual
address space.
- reservelow= [X86]
- Format: nn[K]
- Set the amount of memory to reserve for BIOS at
- the bottom of the address space.
-
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0045e1b..86dae42 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1693,35 +1693,6 @@ config X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK
Set whether the default state of memory_corruption_check is
on or off.
-config X86_RESERVE_LOW
- int "Amount of low memory, in kilobytes, to reserve for the BIOS"
- default 64
- range 4 640
- help
- Specify the amount of low memory to reserve for the BIOS.
-
- The first page contains BIOS data structures that the kernel
- must not use, so that page must always be reserved.
-
- By default we reserve the first 64K of physical RAM, as a
- number of BIOSes are known to corrupt that memory range
- during events such as suspend/resume or monitor cable
- insertion, so it must not be used by the kernel.
-
- You can set this to 4 if you are absolutely sure that you
- trust the BIOS to get all its memory reservations and usages
- right. If you know your BIOS have problems beyond the
- default 64K area, you can set this to 640 to avoid using the
- entire low memory range.
-
- If you have doubts about the BIOS (e.g. suspend/resume does
- not work or there's kernel crashes after certain hardware
- hotplug events) then you might want to enable
- X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check
- typical corruption patterns.
-
- Leave this to the default value of 64 if you are unsure.
-
config MATH_EMULATION
bool
depends on MODIFY_LDT_SYSCALL
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1e72062..7638ac6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -695,30 +695,6 @@ static void __init e820_add_kernel_range(void)
e820__range_add(start, size, E820_TYPE_RAM);
}
-static unsigned reserve_low = CONFIG_X86_RESERVE_LOW << 10;
-
-static int __init parse_reservelow(char *p)
-{
- unsigned long long size;
-
- if (!p)
- return -EINVAL;
-
- size = memparse(p, &p);
-
- if (size < 4096)
- size = 4096;
-
- if (size > 640*1024)
- size = 640*1024;
-
- reserve_low = size;
-
- return 0;
-}
-
-early_param("reservelow", parse_reservelow);
-
static void __init early_reserve_memory(void)
{
/*
On 6/1/21 12:53 AM, Mike Rapoport wrote:
> There are BIOSes that are known to corrupt the memory under 1M, or more
> precisely under 640K because the memory above 640K is anyway reserved for
> the EGA/VGA frame buffer and BIOS.
Should there have been a Cc: stable@ on this?
Seems like the kind of thing we'd want backported.
On Thu, Jul 01, 2021 at 10:15:29AM -0700, Dave Hansen wrote:
> On 6/1/21 12:53 AM, Mike Rapoport wrote:
> > There are BIOSes that are known to corrupt the memory under 1M, or more
> > precisely under 640K because the memory above 640K is anyway reserved for
> > the EGA/VGA frame buffer and BIOS.
>
> Should there have been a Cc: stable@ on this?
>
> Seems like the kind of thing we'd want backported.
The commit this patch is fixing (a799c2bd29d1) went to v5.13-rc1, so there
is no need to backport it.
--
Sincerely yours,
Mike.
On Thu, Jun 3, 2021, at 11:01 AM, tip-bot2 for Mike Rapoport wrote:
> The following commit has been merged into the x86/urgent branch of tip:
>
> Commit-ID: f1d4d47c5851b348b7713007e152bc68b94d728b
> Gitweb:
> https://git.kernel.org/tip/f1d4d47c5851b348b7713007e152bc68b94d728b
> Author: Mike Rapoport <[email protected]>
> AuthorDate: Tue, 01 Jun 2021 10:53:52 +03:00
> Committer: Borislav Petkov <[email protected]>
> CommitterDate: Thu, 03 Jun 2021 19:57:55 +02:00
>
> x86/setup: Always reserve the first 1M of RAM
>
> There are BIOSes that are known to corrupt the memory under 1M, or more
> precisely under 640K because the memory above 640K is anyway reserved
> for the EGA/VGA frame buffer and BIOS.
>
> To prevent usage of the memory that will be potentially clobbered by the
> kernel, the beginning of the memory is always reserved. The exact size
> of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time
> and the "reservelow=" command line option. The reserved range may be
> from 4K to 640K with the default of 64K. There are also configurations
> that reserve the entire 1M range, like machines with SandyBridge graphic
> devices or systems that enable crash kernel.
>
> In addition to the potentially clobbered memory, EBDA of unknown size may
> be as low as 128K and the memory above that EBDA start is also reserved
> early.
>
> It would have been possible to reserve the entire range under 1M unless for
> the real mode trampoline that must reside in that area.
>
> To accommodate placement of the real mode trampoline and keep the memory
> safe from being clobbered by BIOS, reserve the first 64K of RAM before
> memory allocations are possible and then, after the real mode trampoline
> is allocated, reserve the entire range from 0 to 1M.
>
> Update trim_snb_memory() and reserve_real_mode() to avoid redundant
> reservations of the same memory range.
>
> Also make sure the memory under 1M is not getting freed by
> efi_free_boot_services().
This is quite broken. The comments in the patch seem to understand that Linux tries twice to allocate the real mode trampoline, but the code has some issues.
First, it actively breaks the logic here:
+ /*
+ * Don't free memory under 1M for two reasons:
+ * - BIOS might clobber it
+ * - Crash kernel needs it to be reserved
+ */
+ if (start + size < SZ_1M)
+ continue;
+ if (start < SZ_1M) {
+ size -= (SZ_1M - start);
+ start = SZ_1M;
+ }
+
The whole point is that, if we fail to allocate a trampoline, we free boot services and try again. But if we can't free boot services below 1M, then we can't allocate a trampoline in boot services memory. And then it does:
+ /*
+ * Unconditionally reserve the entire fisrt 1M, see comment in
+ * setup_arch().
+ */
+ memblock_reserve(0, SZ_1M);
But this runs even if we just failed to allocate a trampoline on the first try, again dooming the kernel to panic.
I real the commit message and the linked bug, and I'm having trouble finding evidence of anything actually fixed by this patch. Can we just revert it? If not, it would be nice to get a fixup patch that genuinely cleans this up -- the whole structure of the code (first, try to allocate trampoline, then free boot services, then try again) isn't really conducive to a model where we *don't* free boot services < 1M.
Discovered by my delightful laptop, which does not boot with this patch applied.
--Andy
On Wed, Mar 01, 2023 at 07:51:43PM -0800, Andy Lutomirski wrote:
> This is quite broken. The comments in the patch seem to understand
> that Linux tries twice to allocate the real mode trampoline, but the
> code has some issues.
>
> First, it actively breaks the logic here:
>
> + /*
> + * Don't free memory under 1M for two reasons:
> + * - BIOS might clobber it
> + * - Crash kernel needs it to be reserved
> + */
> + if (start + size < SZ_1M)
> + continue;
> + if (start < SZ_1M) {
> + size -= (SZ_1M - start);
> + start = SZ_1M;
> + }
> +
Are you refering, per-chance, here to your comment in that same function
a bit higher?
Introduced by this thing here:
5bc653b73182 ("x86/efi: Allocate a trampoline if needed in efi_free_boot_services()")
?
Also, it looks like Mike did pay attention to your commit:
https://lore.kernel.org/all/[email protected]/
And then there's the whole deal with kdump kernel needing lowmem. The
function which became obsolete and got removed by:
23721c8e92f7 ("x86/crash: Remove crash_reserve_low_1M()")
So, considering how yours is the only report that breaks booting and
this reservation of <=1M has been out there for ~2 years without any
complaints, I'm thinking what we should do now is fix that logic.
Btw, this whole effort started with
a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
Also see this:
ec35d1d93bf8 ("x86/setup: Document that Windows reserves the first MiB")
and with shit like that, we're "piggybacking" on Windoze since there
certification happens at least.
Which begs the question: how does your laptop even boot on windoze if
windoze reserves that 1M too?!
> I real the commit message and the linked bug, and I'm having trouble
> finding evidence of anything actually fixed by this patch. Can we
> just revert it? If not, it would be nice to get a fixup patch that
> genuinely cleans this up -- the whole structure of the code (first,
> try to allocate trampoline, then free boot services, then try again)
> isn't really conducive to a model where we *don't* free boot services
> < 1M.
Yes, I think this makes most sense. And that whole area is a minefield
so the less we upset the current universe, the better.
> Discovered by my delightful laptop, which does not boot with this patch applied.
How come your laptop hasn't booted new Linux since then?!? Tztztztz
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Mar 2, 2023, at 2:50 AM, Borislav Petkov wrote:
> On Wed, Mar 01, 2023 at 07:51:43PM -0800, Andy Lutomirski wrote:
>> This is quite broken. The comments in the patch seem to understand
>> that Linux tries twice to allocate the real mode trampoline, but the
>> code has some issues.
>>
>> First, it actively breaks the logic here:
>>
>> + /*
>> + * Don't free memory under 1M for two reasons:
>> + * - BIOS might clobber it
>> + * - Crash kernel needs it to be reserved
>> + */
>> + if (start + size < SZ_1M)
>> + continue;
>> + if (start < SZ_1M) {
>> + size -= (SZ_1M - start);
>> + start = SZ_1M;
>> + }
>> +
>
> Are you refering, per-chance, here to your comment in that same function
> a bit higher?
>
> Introduced by this thing here:
>
> 5bc653b73182 ("x86/efi: Allocate a trampoline if needed in
> efi_free_boot_services()")
>
> ?
Yes.
>
> Also, it looks like Mike did pay attention to your commit:
>
> https://lore.kernel.org/all/[email protected]/
He definitely did. But I'm still pretty sure the patch in question broke it :-/
>
> And then there's the whole deal with kdump kernel needing lowmem. The
> function which became obsolete and got removed by:
>
> 23721c8e92f7 ("x86/crash: Remove crash_reserve_low_1M()")
>
> So, considering how yours is the only report that breaks booting and
> this reservation of <=1M has been out there for ~2 years without any
> complaints, I'm thinking what we should do now is fix that logic.
>
> Btw, this whole effort started with
>
> a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
>
> Also see this:
>
> ec35d1d93bf8 ("x86/setup: Document that Windows reserves the first MiB")
>
> and with shit like that, we're "piggybacking" on Windoze since there
> certification happens at least.
>
> Which begs the question: how does your laptop even boot on windoze if
> windoze reserves that 1M too?!
I haven't booted Windoze on this thing in years. But...
There is no possible way that Windoze genuinely reserves the first 1M. It does SMP, and x86 needs <1M memory for SMP, so Windoze uses <1M memory. QED :)
>
>> I real the commit message and the linked bug, and I'm having trouble
>> finding evidence of anything actually fixed by this patch. Can we
>> just revert it? If not, it would be nice to get a fixup patch that
>> genuinely cleans this up -- the whole structure of the code (first,
>> try to allocate trampoline, then free boot services, then try again)
>> isn't really conducive to a model where we *don't* free boot services
>> < 1M.
>
> Yes, I think this makes most sense. And that whole area is a minefield
> so the less we upset the current universe, the better.
I'll send a revert patch.
Thinking about this a bit more, if we actually want to "reserve" <1M, we should implement it completely differently by treating <1M as its very own special thing and teaching the memblock allocator to refuse to allocate <1M unless specifically requested. There's only a very small number of allocations that need it (crashkernel for some reason?), and there are at least two spurious users of memblock_phys_alloc_range that curently may use <1M but have no business doing so (ramdisk code and the NUMA distance table). But let's only do that if there's an actual problem to solve.
>
>> Discovered by my delightful laptop, which does not boot with this patch applied.
>
> How come your laptop hasn't booted new Linux since then?!? Tztztztz
Honestly, no clue. Looking at the logs, I'm pretty sure I *did* boot an affected (6.0) kernel. The actual problematic memory map on this laptop seems to show up a bit inconsistently as some horrible combination of firmware settings (especially SGX) and who-knows-what else. My best guess is that a GRUB update I installed yesterday caused some tiny memory map change that triggered it.
I did install a new kernel yesterday too, but the *previous* kernel stopped booting too.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Mar 02, 2023 at 07:06:11AM -0800, Andy Lutomirski wrote:
> There is no possible way that Windoze genuinely reserves the first 1M.
> It does SMP, and x86 needs <1M memory for SMP, so Windoze uses <1M
> memory. QED :)
Then we need to sort this out first. Because this patch says the
contrary.
> >> I real the commit message and the linked bug, and I'm having trouble
> >> finding evidence of anything actually fixed by this patch. Can we
> >> just revert it? If not, it would be nice to get a fixup patch that
> >> genuinely cleans this up -- the whole structure of the code (first,
> >> try to allocate trampoline, then free boot services, then try again)
> >> isn't really conducive to a model where we *don't* free boot services
> >> < 1M.
> >
> > Yes, I think this makes most sense. And that whole area is a minefield
> > so the less we upset the current universe, the better.
>
> I'll send a revert patch.
I actually replied to the text which spoke about a "fixup patch" - not
a revert patch.
> Thinking about this a bit more, if we actually want to "reserve" <1M,
> we should implement it completely differently by treating <1M as its
> very own special thing and teaching the memblock allocator to refuse
> to allocate <1M unless specifically requested. There's only a very
> small number of allocations that need it (crashkernel for some
> reason?), and there are at least two spurious users of
> memblock_phys_alloc_range that curently may use <1M but have no
> business doing so (ramdisk code and the NUMA distance table). But
> let's only do that if there's an actual problem to solve.
No, look at early_reserve_memory(). All kinds of crap use that <1M and
we do special reservations there.
I agree with making it a special region aspect.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Mar 1, 2023, at 7:51 PM, Andy Lutomirski wrote:
> On Thu, Jun 3, 2021, at 11:01 AM, tip-bot2 for Mike Rapoport wrote:
>> The following commit has been merged into the x86/urgent branch of tip:
>>
>> Commit-ID: f1d4d47c5851b348b7713007e152bc68b94d728b
>> Gitweb:
>> https://git.kernel.org/tip/f1d4d47c5851b348b7713007e152bc68b94d728b
>> Author: Mike Rapoport <[email protected]>
>> AuthorDate: Tue, 01 Jun 2021 10:53:52 +03:00
>> Committer: Borislav Petkov <[email protected]>
>> CommitterDate: Thu, 03 Jun 2021 19:57:55 +02:00
>>
>> x86/setup: Always reserve the first 1M of RAM
>>
>> There are BIOSes that are known to corrupt the memory under 1M, or more
>> precisely under 640K because the memory above 640K is anyway reserved
>> for the EGA/VGA frame buffer and BIOS.
>>
>> To prevent usage of the memory that will be potentially clobbered by the
>> kernel, the beginning of the memory is always reserved. The exact size
>> of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time
>> and the "reservelow=" command line option. The reserved range may be
>> from 4K to 640K with the default of 64K. There are also configurations
>> that reserve the entire 1M range, like machines with SandyBridge graphic
>> devices or systems that enable crash kernel.
>>
>> In addition to the potentially clobbered memory, EBDA of unknown size may
>> be as low as 128K and the memory above that EBDA start is also reserved
>> early.
>>
>> It would have been possible to reserve the entire range under 1M unless for
>> the real mode trampoline that must reside in that area.
>>
>> To accommodate placement of the real mode trampoline and keep the memory
>> safe from being clobbered by BIOS, reserve the first 64K of RAM before
>> memory allocations are possible and then, after the real mode trampoline
>> is allocated, reserve the entire range from 0 to 1M.
>>
>> Update trim_snb_memory() and reserve_real_mode() to avoid redundant
>> reservations of the same memory range.
>>
>> Also make sure the memory under 1M is not getting freed by
>> efi_free_boot_services().
>
> This is quite broken. The comments in the patch seem to understand
> that Linux tries twice to allocate the real mode trampoline, but the
> code has some issues.
>
>
> First, it actively breaks the logic here:
>
>
> + /*
> + * Don't free memory under 1M for two reasons:
> + * - BIOS might clobber it
> + * - Crash kernel needs it to be reserved
> + */
> + if (start + size < SZ_1M)
> + continue;
> + if (start < SZ_1M) {
> + size -= (SZ_1M - start);
> + start = SZ_1M;
> + }
> +
>
>
> The whole point is that, if we fail to allocate a trampoline, we free
> boot services and try again. But if we can't free boot services below
> 1M, then we can't allocate a trampoline in boot services memory. And
> then it does:
>
>
> + /*
> + * Unconditionally reserve the entire fisrt 1M, see comment in
> + * setup_arch().
> + */
> + memblock_reserve(0, SZ_1M);
>
My apologies, I misread this thing. The patch is *not* obviously buggy, but something is buggy. I'll keep investigating...
--Andy
Hi Andy,
On Wed, Mar 01, 2023 at 07:51:43PM -0800, Andy Lutomirski wrote:
> On Thu, Jun 3, 2021, at 11:01 AM, tip-bot2 for Mike Rapoport wrote:
> >
> > x86/setup: Always reserve the first 1M of RAM
> >
...
> + /*
> + * Unconditionally reserve the entire fisrt 1M, see comment in
> + * setup_arch().
> + */
> + memblock_reserve(0, SZ_1M);
>
>
> But this runs even if we just failed to allocate a trampoline on the
> first try, again dooming the kernel to panic.
>
> I real the commit message and the linked bug, and I'm having trouble
> finding evidence of anything actually fixed by this patch. Can we just
> revert it? If not, it would be nice to get a fixup patch that genuinely
> cleans this up -- the whole structure of the code (first, try to allocate
> trampoline, then free boot services, then try again) isn't really
> conducive to a model where we *don't* free boot services < 1M.
Currently, the second attempt to set_real_mode_mem() in
efi_free_boot_services() does not allocate from memblock anyway but reuses
memory freed from EFI services. Could be that failure to boot caused by
another failing reservation?
> Discovered by my delightful laptop, which does not boot with this patch applied.
Do you have early_printk() visible?
> --Andy
--
Sincerely yours,
Mike.
On Fri, Mar 3, 2023, at 1:10 AM, Mike Rapoport wrote:
> Hi Andy,
>
> On Wed, Mar 01, 2023 at 07:51:43PM -0800, Andy Lutomirski wrote:
>> On Thu, Jun 3, 2021, at 11:01 AM, tip-bot2 for Mike Rapoport wrote:
>> >
>> > x86/setup: Always reserve the first 1M of RAM
>> >
>
> ...
>
>> + /*
>> + * Unconditionally reserve the entire fisrt 1M, see comment in
>> + * setup_arch().
>> + */
>> + memblock_reserve(0, SZ_1M);
>>
>>
>> But this runs even if we just failed to allocate a trampoline on the
>> first try, again dooming the kernel to panic.
>>
>> I real the commit message and the linked bug, and I'm having trouble
>> finding evidence of anything actually fixed by this patch. Can we just
>> revert it? If not, it would be nice to get a fixup patch that genuinely
>> cleans this up -- the whole structure of the code (first, try to allocate
>> trampoline, then free boot services, then try again) isn't really
>> conducive to a model where we *don't* free boot services < 1M.
>
> Currently, the second attempt to set_real_mode_mem() in
> efi_free_boot_services() does not allocate from memblock anyway but reuses
> memory freed from EFI services. Could be that failure to boot caused by
> another failing reservation?
I'm not actually sure what's wrong per se. Certainly efi=debug will utterly break my quirk, but other than that, I would have expected it to still work on a more careful reading.
Anyway, I have a fixup series that works in a VM that i'll test in a bit.
>
>> Discovered by my delightful laptop, which does not boot with this patch applied.
>
> Do you have early_printk() visible?
Yes, but I haven't found a smoking gun yet.
>
>> --Andy
>
> --
> Sincerely yours,
> Mike.