Hi,
my notebook doesn't boot with current kernel. Booting stops right after
displaying "loading initial ramdisk..". No further displays.
Also nothing is wriiten to the logs.
last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
Thanks, Jörg
On Thu, Jan 30, 2020 at 9:32 AM Jörg Otte <[email protected]> wrote:
>
> my notebook doesn't boot with current kernel. Booting stops right after
> displaying "loading initial ramdisk..". No further displays.
> Also nothing is wriiten to the logs.
>
> last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
> first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
It would be lovely if you can bisect a bit. But my merges in that
range are all from Ingo:
Ingo Molnar (7):
header cleanup
objtool updates
RCU updates
EFI updates
locking updates
perf updates
scheduler updates
but not having any messages at all makes it hard to guess where it would be.
A few bisect runs would narrow it down a fair amount. Bisecting all
the way would be even better, of course,
Linus
> On Jan 30, 2020, at 10:08 AM, Linus Torvalds <[email protected]> wrote:
>
> On Thu, Jan 30, 2020 at 9:32 AM Jörg Otte <[email protected]> wrote:
>>
>> my notebook doesn't boot with current kernel. Booting stops right after
>> displaying "loading initial ramdisk..". No further displays.
>> Also nothing is wriiten to the logs.
>>
>> last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
>> first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
>
> It would be lovely if you can bisect a bit. But my merges in that
> range are all from Ingo:
>
> Ingo Molnar (7):
> header cleanup
> objtool updates
> RCU updates
> EFI updates
> locking updates
> perf updates
> scheduler updates
>
> but not having any messages at all makes it hard to guess where it would be.
>
> A few bisect runs would narrow it down a fair amount. Bisecting all
> the way would be even better, of course,
>
>
It would also be nice to know: are you EFI-booting or BIOS-booting? And are you using EFI mixed mode? (That is, are you booting a 64-bit kernel using 32-bit EFI?)
* Linus Torvalds <[email protected]> wrote:
> On Thu, Jan 30, 2020 at 9:32 AM J?rg Otte <[email protected]> wrote:
> >
> > my notebook doesn't boot with current kernel. Booting stops right after
> > displaying "loading initial ramdisk..". No further displays.
> > Also nothing is wriiten to the logs.
> >
> > last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
> > first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
>
> It would be lovely if you can bisect a bit. But my merges in that
> range are all from Ingo:
>
> Ingo Molnar (7):
> header cleanup
> objtool updates
> RCU updates
> EFI updates
> locking updates
> perf updates
> scheduler updates
If I had to guess then perhaps the EFI changes look the most dangerous
ones from these trees - but in principle most of these trees could
contain a boot crasher/hang bug.
> but not having any messages at all makes it hard to guess where it
> would be.
To improve debug output:
Removing any 'fbcon' options in /boot/grub/grub.cfg and adding this to
the boot options might improve the debug output:
earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn
So for example if the relevant kernel boot entry in grub.cfg looks like
this:
linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro splash $vt_handoff
Then editing it to the following could in principle produce (much) more
verbose boot output:
linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn $vt_handoff
If this produces more output than just "loading initial ramdisk..' then a
photo of the hung screen would be sufficient, no need to transcribe it.
> A few bisect runs would narrow it down a fair amount. Bisecting all the
> way would be even better, of course,
Agreed!
If compiling full kernels for bisections takes too long (for example
because the .config is from a distro kernel) then running "make
localmodconfig" to create a config tailored to the currently active
modules will cut down significantly on build time.
Also, a warning: if the normal boot log contains spurious warnings then
the new 'panic_on_warn' option will cause additional trouble on good
kernels.
Thanks,
Ingo
Am Fr., 31. Jan. 2020 um 07:43 Uhr schrieb Ingo Molnar <[email protected]>:
>
>
> * Linus Torvalds <[email protected]> wrote:
>
> > On Thu, Jan 30, 2020 at 9:32 AM Jörg Otte <[email protected]> wrote:
> > >
> > > my notebook doesn't boot with current kernel. Booting stops right after
> > > displaying "loading initial ramdisk..". No further displays.
> > > Also nothing is wriiten to the logs.
> > >
> > > last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
> > > first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
> >
> > It would be lovely if you can bisect a bit. But my merges in that
> > range are all from Ingo:
> >
> > Ingo Molnar (7):
> > header cleanup
> > objtool updates
> > RCU updates
> > EFI updates
> > locking updates
> > perf updates
> > scheduler updates
>
> If I had to guess then perhaps the EFI changes look the most dangerous
> ones from these trees - but in principle most of these trees could
> contain a boot crasher/hang bug.
>
> > but not having any messages at all makes it hard to guess where it
> > would be.
>
> To improve debug output:
>
> Removing any 'fbcon' options in /boot/grub/grub.cfg and adding this to
> the boot options might improve the debug output:
>
> earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn
>
> So for example if the relevant kernel boot entry in grub.cfg looks like
> this:
>
> linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro splash $vt_handoff
>
> Then editing it to the following could in principle produce (much) more
> verbose boot output:
>
> linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn $vt_handoff
>
> If this produces more output than just "loading initial ramdisk..' then a
> photo of the hung screen would be sufficient, no need to transcribe it.
>
> > A few bisect runs would narrow it down a fair amount. Bisecting all the
> > way would be even better, of course,
>
> Agreed!
>
> If compiling full kernels for bisections takes too long (for example
> because the .config is from a distro kernel) then running "make
> localmodconfig" to create a config tailored to the currently active
> modules will cut down significantly on build time.
>
> Also, a warning: if the normal boot log contains spurious warnings then
> the new 'panic_on_warn' option will cause additional trouble on good
> kernels.
It's bisected.
The first bad commit is :
1db91035d01aa8bfa2350c00ccb63d629b4041ad
efi: Add tracking for dynamically allocated memmaps
Unfortunately I can not revert because of compile errors!
In file included from /media/jojo/deftoshiba/kernel/linux/init/main.c:48:
/media/jojo/deftoshiba/kernel/linux/include/linux/efi.h:975:1: error:
version control conflict marker in file
<<<<<<< HEAD
^~~~~~~
/media/jojo/deftoshiba/kernel/linux/include/linux/efi.h:980:1: error:
version control conflict marker in file
=======
^~~~~~~
/media/jojo/deftoshiba/kernel/linux/include/linux/efi.h:982:1: error:
version control conflict marker in file
>>>>>>> parent of 1db91035d01a... efi: Add tracking for dynamically allocated memmaps
....
(Cc:ed Dan and Ard)
* J?rg Otte <[email protected]> wrote:
> Am Fr., 31. Jan. 2020 um 07:43 Uhr schrieb Ingo Molnar <[email protected]>:
> >
> >
> > * Linus Torvalds <[email protected]> wrote:
> >
> > > On Thu, Jan 30, 2020 at 9:32 AM J?rg Otte <[email protected]> wrote:
> > > >
> > > > my notebook doesn't boot with current kernel. Booting stops right after
> > > > displaying "loading initial ramdisk..". No further displays.
> > > > Also nothing is wriiten to the logs.
> > > >
> > > > last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
> > > > first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
> > >
> > > It would be lovely if you can bisect a bit. But my merges in that
> > > range are all from Ingo:
> > >
> > > Ingo Molnar (7):
> > > header cleanup
> > > objtool updates
> > > RCU updates
> > > EFI updates
> > > locking updates
> > > perf updates
> > > scheduler updates
> >
> > If I had to guess then perhaps the EFI changes look the most dangerous
> > ones from these trees - but in principle most of these trees could
> > contain a boot crasher/hang bug.
> >
> > > but not having any messages at all makes it hard to guess where it
> > > would be.
> >
> > To improve debug output:
> >
> > Removing any 'fbcon' options in /boot/grub/grub.cfg and adding this to
> > the boot options might improve the debug output:
> >
> > earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn
> >
> > So for example if the relevant kernel boot entry in grub.cfg looks like
> > this:
> >
> > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro splash $vt_handoff
> >
> > Then editing it to the following could in principle produce (much) more
> > verbose boot output:
> >
> > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn $vt_handoff
> >
> > If this produces more output than just "loading initial ramdisk..' then a
> > photo of the hung screen would be sufficient, no need to transcribe it.
> >
> > > A few bisect runs would narrow it down a fair amount. Bisecting all the
> > > way would be even better, of course,
> >
> > Agreed!
> >
> > If compiling full kernels for bisections takes too long (for example
> > because the .config is from a distro kernel) then running "make
> > localmodconfig" to create a config tailored to the currently active
> > modules will cut down significantly on build time.
> >
> > Also, a warning: if the normal boot log contains spurious warnings then
> > the new 'panic_on_warn' option will cause additional trouble on good
> > kernels.
>
> It's bisected.
> The first bad commit is :
> 1db91035d01aa8bfa2350c00ccb63d629b4041ad
> efi: Add tracking for dynamically allocated memmaps
Thanks a ton, that's very useful!
I've Cc:-ed the EFI gents who are developing this code, maybe they'll
spot the bug.
Thanks,
Ingo
On Fri, Jan 31, 2020 at 10:37 AM Ingo Molnar <[email protected]> wrote:
>
>
> (Cc:ed Dan and Ard)
>
> * Jörg Otte <[email protected]> wrote:
>
> > Am Fr., 31. Jan. 2020 um 07:43 Uhr schrieb Ingo Molnar <[email protected]>:
> > >
> > >
> > > * Linus Torvalds <[email protected]> wrote:
> > >
> > > > On Thu, Jan 30, 2020 at 9:32 AM Jörg Otte <[email protected]> wrote:
> > > > >
> > > > > my notebook doesn't boot with current kernel. Booting stops right after
> > > > > displaying "loading initial ramdisk..". No further displays.
> > > > > Also nothing is wriiten to the logs.
> > > > >
> > > > > last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
> > > > > first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
> > > >
> > > > It would be lovely if you can bisect a bit. But my merges in that
> > > > range are all from Ingo:
> > > >
> > > > Ingo Molnar (7):
> > > > header cleanup
> > > > objtool updates
> > > > RCU updates
> > > > EFI updates
> > > > locking updates
> > > > perf updates
> > > > scheduler updates
> > >
> > > If I had to guess then perhaps the EFI changes look the most dangerous
> > > ones from these trees - but in principle most of these trees could
> > > contain a boot crasher/hang bug.
> > >
> > > > but not having any messages at all makes it hard to guess where it
> > > > would be.
> > >
> > > To improve debug output:
> > >
> > > Removing any 'fbcon' options in /boot/grub/grub.cfg and adding this to
> > > the boot options might improve the debug output:
> > >
> > > earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn
> > >
> > > So for example if the relevant kernel boot entry in grub.cfg looks like
> > > this:
> > >
> > > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro splash $vt_handoff
> > >
> > > Then editing it to the following could in principle produce (much) more
> > > verbose boot output:
> > >
> > > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn $vt_handoff
> > >
> > > If this produces more output than just "loading initial ramdisk..' then a
> > > photo of the hung screen would be sufficient, no need to transcribe it.
> > >
> > > > A few bisect runs would narrow it down a fair amount. Bisecting all the
> > > > way would be even better, of course,
> > >
> > > Agreed!
> > >
> > > If compiling full kernels for bisections takes too long (for example
> > > because the .config is from a distro kernel) then running "make
> > > localmodconfig" to create a config tailored to the currently active
> > > modules will cut down significantly on build time.
> > >
> > > Also, a warning: if the normal boot log contains spurious warnings then
> > > the new 'panic_on_warn' option will cause additional trouble on good
> > > kernels.
> >
> > It's bisected.
> > The first bad commit is :
> > 1db91035d01aa8bfa2350c00ccb63d629b4041ad
> > efi: Add tracking for dynamically allocated memmaps
>
> Thanks a ton, that's very useful!
>
> I've Cc:-ed the EFI gents who are developing this code, maybe they'll
> spot the bug.
I'll take a look. Jörg, can you paste a full dmesg from a good boot?
J?rg Otte wrote:
> It's bisected.
> The first bad commit is :
> 1db91035d01aa8bfa2350c00ccb63d629b4041ad
> efi: Add tracking for dynamically allocated memmaps
> Unfortunately I can not revert because of compile errors!
>
> In file included from /media/jojo/deftoshiba/kernel/linux/init/main.c:48:
> /media/jojo/deftoshiba/kernel/linux/include/linux/efi.h:975:1: error:
> version control conflict marker in file
> <<<<<<< HEAD
So 1db91035d0 doesn't revert cleanly, because 484a418d0754 depends on it,
plus there a third commit (f0ef6523475f) that has a semantic dependency
on 1db91035d01a.
But you can revert them all, if done in reverse chronological order:
git revert f0ef6523475f # ("efi: Fix efi_memmap_alloc() leaks")
git revert 484a418d0754 # ("efi: Fix handling of multiple efi_fake_mem= entries")
git revert 1db91035d01a # ("efi: Add tracking for dynamically allocated memmaps")
You can paste those three lines into a shell as-is, or you can apply the
patch below which is a combination of these three reverts.
J?rg, can you confirm that doing these reverts fixes booting on your
system? If it does then a full dmesg from that kernel would be useful.
FWIW I reviewed the bisected commit and didn't find the bug but I also
couldn't convince myself it's a 1:1 identity transformation and cleanup
of the existing logic.
The size arithmethics transformation looks correct at first sight, but
the data->flags handling in particular looks rather interwoven.
Thanks,
Ingo
============================>
arch/x86/platform/efi/efi.c | 10 ++----
arch/x86/platform/efi/quirks.c | 23 +++++++------
drivers/firmware/efi/fake_mem.c | 43 ++++++++++++------------
drivers/firmware/efi/memmap.c | 72 ++++++++++++-----------------------------
include/linux/efi.h | 13 +++-----
5 files changed, 63 insertions(+), 98 deletions(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 59f7f6d60cf6..4e46d2d24352 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -304,16 +304,10 @@ static void __init efi_clean_memmap(void)
}
if (n_removal > 0) {
- struct efi_memory_map_data data = {
- .phys_map = efi.memmap.phys_map,
- .desc_version = efi.memmap.desc_version,
- .desc_size = efi.memmap.desc_size,
- .size = data.desc_size * (efi.memmap.nr_map - n_removal),
- .flags = 0,
- };
+ u64 size = efi.memmap.nr_map - n_removal;
pr_warn("Removing %d invalid memory map entries.\n", n_removal);
- efi_memmap_install(&data);
+ efi_memmap_install(efi.memmap.phys_map, size);
}
}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 88d32c06cffa..86b44289ac55 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -244,7 +244,7 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
*/
void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
{
- struct efi_memory_map_data data = { 0 };
+ phys_addr_t new_phys, new_size;
struct efi_mem_range mr;
efi_memory_desc_t md;
int num_entries;
@@ -272,21 +272,24 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
num_entries = efi_memmap_split_count(&md, &mr.range);
num_entries += efi.memmap.nr_map;
- if (efi_memmap_alloc(num_entries, &data) != 0) {
+ new_size = efi.memmap.desc_size * num_entries;
+
+ new_phys = efi_memmap_alloc(num_entries);
+ if (!new_phys) {
pr_err("Could not allocate boot services memmap\n");
return;
}
- new = early_memremap(data.phys_map, data.size);
+ new = early_memremap(new_phys, new_size);
if (!new) {
pr_err("Failed to map new boot services memmap\n");
return;
}
efi_memmap_insert(&efi.memmap, new, &mr);
- early_memunmap(new, data.size);
+ early_memunmap(new, new_size);
- efi_memmap_install(&data);
+ efi_memmap_install(new_phys, num_entries);
e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
e820__update_table(e820_table);
}
@@ -405,7 +408,7 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
void __init efi_free_boot_services(void)
{
- struct efi_memory_map_data data = { 0 };
+ phys_addr_t new_phys, new_size;
efi_memory_desc_t *md;
int num_entries = 0;
void *new, *new_md;
@@ -460,12 +463,14 @@ void __init efi_free_boot_services(void)
if (!num_entries)
return;
- if (efi_memmap_alloc(num_entries, &data) != 0) {
+ new_size = efi.memmap.desc_size * num_entries;
+ new_phys = efi_memmap_alloc(num_entries);
+ if (!new_phys) {
pr_err("Failed to allocate new EFI memmap\n");
return;
}
- new = memremap(data.phys_map, data.size, MEMREMAP_WB);
+ new = memremap(new_phys, new_size, MEMREMAP_WB);
if (!new) {
pr_err("Failed to map new EFI memmap\n");
return;
@@ -489,7 +494,7 @@ void __init efi_free_boot_services(void)
memunmap(new);
- if (efi_memmap_install(&data) != 0) {
+ if (efi_memmap_install(new_phys, num_entries)) {
pr_err("Could not install new EFI memmap\n");
return;
}
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 6e0f34a38171..bb9fc70d0cfa 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -34,45 +34,46 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
return 0;
}
-static void __init efi_fake_range(struct efi_mem_range *efi_range)
+void __init efi_fake_memmap(void)
{
- struct efi_memory_map_data data = { 0 };
int new_nr_map = efi.memmap.nr_map;
efi_memory_desc_t *md;
+ phys_addr_t new_memmap_phy;
void *new_memmap;
+ int i;
+
+ if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
+ return;
/* count up the number of EFI memory descriptor */
- for_each_efi_memory_desc(md)
- new_nr_map += efi_memmap_split_count(md, &efi_range->range);
+ for (i = 0; i < nr_fake_mem; i++) {
+ for_each_efi_memory_desc(md) {
+ struct range *r = &efi_fake_mems[i].range;
+
+ new_nr_map += efi_memmap_split_count(md, r);
+ }
+ }
/* allocate memory for new EFI memmap */
- if (efi_memmap_alloc(new_nr_map, &data) != 0)
+ new_memmap_phy = efi_memmap_alloc(new_nr_map);
+ if (!new_memmap_phy)
return;
/* create new EFI memmap */
- new_memmap = early_memremap(data.phys_map, data.size);
+ new_memmap = early_memremap(new_memmap_phy,
+ efi.memmap.desc_size * new_nr_map);
if (!new_memmap) {
- __efi_memmap_free(data.phys_map, data.size, data.flags);
+ memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
return;
}
- efi_memmap_insert(&efi.memmap, new_memmap, efi_range);
+ for (i = 0; i < nr_fake_mem; i++)
+ efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
/* swap into new EFI memmap */
- early_memunmap(new_memmap, data.size);
-
- efi_memmap_install(&data);
-}
-
-void __init efi_fake_memmap(void)
-{
- int i;
+ early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
- if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
- return;
-
- for (i = 0; i < nr_fake_mem; i++)
- efi_fake_range(&efi_fake_mems[i]);
+ efi_memmap_install(new_memmap_phy, new_nr_map);
/* print new EFI memmap */
efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 2ff1883dc788..4f98eb12c172 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,32 +29,9 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
return PFN_PHYS(page_to_pfn(p));
}
-void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
-{
- if (flags & EFI_MEMMAP_MEMBLOCK) {
- if (slab_is_available())
- memblock_free_late(phys, size);
- else
- memblock_free(phys, size);
- } else if (flags & EFI_MEMMAP_SLAB) {
- struct page *p = pfn_to_page(PHYS_PFN(phys));
- unsigned int order = get_order(size);
-
- free_pages((unsigned long) page_address(p), order);
- }
-}
-
-static void __init efi_memmap_free(void)
-{
- __efi_memmap_free(efi.memmap.phys_map,
- efi.memmap.desc_size * efi.memmap.nr_map,
- efi.memmap.flags);
-}
-
/**
* efi_memmap_alloc - Allocate memory for the EFI memory map
* @num_entries: Number of entries in the allocated map.
- * @data: efi memmap installation parameters
*
* Depending on whether mm_init() has already been invoked or not,
* either memblock or "normal" page allocation is used.
@@ -62,29 +39,14 @@ static void __init efi_memmap_free(void)
* Returns the physical address of the allocated memory map on
* success, zero on failure.
*/
-int __init efi_memmap_alloc(unsigned int num_entries,
- struct efi_memory_map_data *data)
+phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
{
- /* Expect allocation parameters are zero initialized */
- WARN_ON(data->phys_map || data->size);
-
- data->size = num_entries * efi.memmap.desc_size;
- data->desc_version = efi.memmap.desc_version;
- data->desc_size = efi.memmap.desc_size;
- data->flags &= ~(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK);
- data->flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
-
- if (slab_is_available()) {
- data->flags |= EFI_MEMMAP_SLAB;
- data->phys_map = __efi_memmap_alloc_late(data->size);
- } else {
- data->flags |= EFI_MEMMAP_MEMBLOCK;
- data->phys_map = __efi_memmap_alloc_early(data->size);
- }
+ unsigned long size = num_entries * efi.memmap.desc_size;
- if (!data->phys_map)
- return -ENOMEM;
- return 0;
+ if (slab_is_available())
+ return __efi_memmap_alloc_late(size);
+
+ return __efi_memmap_alloc_early(size);
}
/**
@@ -102,7 +64,8 @@ int __init efi_memmap_alloc(unsigned int num_entries,
*
* Returns zero on success, a negative error code on failure.
*/
-static int __init __efi_memmap_init(struct efi_memory_map_data *data)
+static int __init
+__efi_memmap_init(struct efi_memory_map_data *data)
{
struct efi_memory_map map;
phys_addr_t phys_map;
@@ -122,9 +85,6 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data)
return -ENOMEM;
}
- /* NOP if data->flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB) == 0 */
- efi_memmap_free();
-
map.phys_map = data->phys_map;
map.nr_map = data->size / data->desc_size;
map.map_end = map.map + data->size;
@@ -224,7 +184,8 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
/**
* efi_memmap_install - Install a new EFI memory map in efi.memmap
- * @ctx: map allocation parameters (address, size, flags)
+ * @addr: Physical address of the memory map
+ * @nr_map: Number of entries in the memory map
*
* Unlike efi_memmap_init_*(), this function does not allow the caller
* to switch from early to late mappings. It simply uses the existing
@@ -232,11 +193,20 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
*
* Returns zero on success, a negative error code on failure.
*/
-int __init efi_memmap_install(struct efi_memory_map_data *data)
+int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
{
+ struct efi_memory_map_data data;
+ unsigned long flags;
+
efi_memmap_unmap();
- return __efi_memmap_init(data);
+ data.phys_map = addr;
+ data.size = efi.memmap.desc_size * nr_map;
+ data.desc_version = efi.memmap.desc_version;
+ data.desc_size = efi.memmap.desc_size;
+ data.flags = efi.memmap.flags & EFI_MEMMAP_LATE;
+
+ return __efi_memmap_init(&data);
}
/**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 7efd7072cca5..f117d68c314e 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -759,8 +759,8 @@ typedef union {
/*
* Architecture independent structure for describing a memory map for the
- * benefit of efi_memmap_init_early(), and for passing context between
- * efi_memmap_alloc() and efi_memmap_install().
+ * benefit of efi_memmap_init_early(), saving us the need to pass four
+ * parameters.
*/
struct efi_memory_map_data {
phys_addr_t phys_map;
@@ -778,8 +778,6 @@ struct efi_memory_map {
unsigned long desc_version;
unsigned long desc_size;
#define EFI_MEMMAP_LATE (1UL << 0)
-#define EFI_MEMMAP_MEMBLOCK (1UL << 1)
-#define EFI_MEMMAP_SLAB (1UL << 2)
unsigned long flags;
};
@@ -974,14 +972,11 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
#endif
extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
-extern int __init efi_memmap_alloc(unsigned int num_entries,
- struct efi_memory_map_data *data);
-extern void __efi_memmap_free(u64 phys, unsigned long size,
- unsigned long flags);
+extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
extern void __init efi_memmap_unmap(void);
-extern int __init efi_memmap_install(struct efi_memory_map_data *data);
+extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map);
extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
struct range *range);
extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
Am Fr., 31. Jan. 2020 um 19:48 Uhr schrieb Dan Williams
<[email protected]>:
>
> On Fri, Jan 31, 2020 at 10:37 AM Ingo Molnar <[email protected]> wrote:
> >
> >
> > (Cc:ed Dan and Ard)
> >
> > * Jörg Otte <[email protected]> wrote:
> >
> > > Am Fr., 31. Jan. 2020 um 07:43 Uhr schrieb Ingo Molnar <[email protected]>:
> > > >
> > > >
> > > > * Linus Torvalds <[email protected]> wrote:
> > > >
> > > > > On Thu, Jan 30, 2020 at 9:32 AM Jörg Otte <[email protected]> wrote:
> > > > > >
> > > > > > my notebook doesn't boot with current kernel. Booting stops right after
> > > > > > displaying "loading initial ramdisk..". No further displays.
> > > > > > Also nothing is wriiten to the logs.
> > > > > >
> > > > > > last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
> > > > > > first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
> > > > >
> > > > > It would be lovely if you can bisect a bit. But my merges in that
> > > > > range are all from Ingo:
> > > > >
> > > > > Ingo Molnar (7):
> > > > > header cleanup
> > > > > objtool updates
> > > > > RCU updates
> > > > > EFI updates
> > > > > locking updates
> > > > > perf updates
> > > > > scheduler updates
> > > >
> > > > If I had to guess then perhaps the EFI changes look the most dangerous
> > > > ones from these trees - but in principle most of these trees could
> > > > contain a boot crasher/hang bug.
> > > >
> > > > > but not having any messages at all makes it hard to guess where it
> > > > > would be.
> > > >
> > > > To improve debug output:
> > > >
> > > > Removing any 'fbcon' options in /boot/grub/grub.cfg and adding this to
> > > > the boot options might improve the debug output:
> > > >
> > > > earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn
> > > >
> > > > So for example if the relevant kernel boot entry in grub.cfg looks like
> > > > this:
> > > >
> > > > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro splash $vt_handoff
> > > >
> > > > Then editing it to the following could in principle produce (much) more
> > > > verbose boot output:
> > > >
> > > > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn $vt_handoff
> > > >
> > > > If this produces more output than just "loading initial ramdisk..' then a
> > > > photo of the hung screen would be sufficient, no need to transcribe it.
> > > >
> > > > > A few bisect runs would narrow it down a fair amount. Bisecting all the
> > > > > way would be even better, of course,
> > > >
> > > > Agreed!
> > > >
> > > > If compiling full kernels for bisections takes too long (for example
> > > > because the .config is from a distro kernel) then running "make
> > > > localmodconfig" to create a config tailored to the currently active
> > > > modules will cut down significantly on build time.
> > > >
> > > > Also, a warning: if the normal boot log contains spurious warnings then
> > > > the new 'panic_on_warn' option will cause additional trouble on good
> > > > kernels.
> > >
> > > It's bisected.
> > > The first bad commit is :
> > > 1db91035d01aa8bfa2350c00ccb63d629b4041ad
> > > efi: Add tracking for dynamically allocated memmaps
> >
> > Thanks a ton, that's very useful!
> >
> > I've Cc:-ed the EFI gents who are developing this code, maybe they'll
> > spot the bug.
>
> I'll take a look. Jörg, can you paste a full dmesg from a good boot?
Here it is.
Thanks, Jörg
On Sat, 1 Feb 2020 at 16:35, Jörg Otte <[email protected]> wrote:
>
> Am Fr., 31. Jan. 2020 um 19:48 Uhr schrieb Dan Williams
> <[email protected]>:
> >
> > On Fri, Jan 31, 2020 at 10:37 AM Ingo Molnar <[email protected]> wrote:
> > >
> > >
> > > (Cc:ed Dan and Ard)
> > >
> > > * Jörg Otte <[email protected]> wrote:
> > >
> > > > Am Fr., 31. Jan. 2020 um 07:43 Uhr schrieb Ingo Molnar <[email protected]>:
> > > > >
> > > > >
> > > > > * Linus Torvalds <[email protected]> wrote:
> > > > >
> > > > > > On Thu, Jan 30, 2020 at 9:32 AM Jörg Otte <[email protected]> wrote:
> > > > > > >
> > > > > > > my notebook doesn't boot with current kernel. Booting stops right after
> > > > > > > displaying "loading initial ramdisk..". No further displays.
> > > > > > > Also nothing is wriiten to the logs.
> > > > > > >
> > > > > > > last known good kernel is : vmlinuz-5.5.0-00849-gb0be0eff1a5a
> > > > > > > first known bad kernel is : vmlinuz-5.5.0-01154-gc677124e631d
> > > > > >
> > > > > > It would be lovely if you can bisect a bit. But my merges in that
> > > > > > range are all from Ingo:
> > > > > >
> > > > > > Ingo Molnar (7):
> > > > > > header cleanup
> > > > > > objtool updates
> > > > > > RCU updates
> > > > > > EFI updates
> > > > > > locking updates
> > > > > > perf updates
> > > > > > scheduler updates
> > > > >
> > > > > If I had to guess then perhaps the EFI changes look the most dangerous
> > > > > ones from these trees - but in principle most of these trees could
> > > > > contain a boot crasher/hang bug.
> > > > >
> > > > > > but not having any messages at all makes it hard to guess where it
> > > > > > would be.
> > > > >
> > > > > To improve debug output:
> > > > >
> > > > > Removing any 'fbcon' options in /boot/grub/grub.cfg and adding this to
> > > > > the boot options might improve the debug output:
> > > > >
> > > > > earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn
> > > > >
> > > > > So for example if the relevant kernel boot entry in grub.cfg looks like
> > > > > this:
> > > > >
> > > > > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro splash $vt_handoff
> > > > >
> > > > > Then editing it to the following could in principle produce (much) more
> > > > > verbose boot output:
> > > > >
> > > > > linux /vmlinuz-5.3.0-26-generic root=UUID=1bcxabe3-0b62-4x04-b456-47cd90c0e6x4 ro earlyprintk=vga initcall_debug ignore_loglevel debug panic_on_warn $vt_handoff
> > > > >
> > > > > If this produces more output than just "loading initial ramdisk..' then a
> > > > > photo of the hung screen would be sufficient, no need to transcribe it.
> > > > >
> > > > > > A few bisect runs would narrow it down a fair amount. Bisecting all the
> > > > > > way would be even better, of course,
> > > > >
> > > > > Agreed!
> > > > >
> > > > > If compiling full kernels for bisections takes too long (for example
> > > > > because the .config is from a distro kernel) then running "make
> > > > > localmodconfig" to create a config tailored to the currently active
> > > > > modules will cut down significantly on build time.
> > > > >
> > > > > Also, a warning: if the normal boot log contains spurious warnings then
> > > > > the new 'panic_on_warn' option will cause additional trouble on good
> > > > > kernels.
> > > >
> > > > It's bisected.
> > > > The first bad commit is :
> > > > 1db91035d01aa8bfa2350c00ccb63d629b4041ad
> > > > efi: Add tracking for dynamically allocated memmaps
> > >
> > > Thanks a ton, that's very useful!
> > >
> > > I've Cc:-ed the EFI gents who are developing this code, maybe they'll
> > > spot the bug.
> >
> > I'll take a look. Jörg, can you paste a full dmesg from a good boot?
>
> Here it is.
>
Hello Jörg,
Could you please try whether the change below fixes the issue?
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 59f7f6d60cf6..ae923ee8e2b4 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -308,7 +308,7 @@ static void __init efi_clean_memmap(void)
.phys_map = efi.memmap.phys_map,
.desc_version = efi.memmap.desc_version,
.desc_size = efi.memmap.desc_size,
- .size = data.desc_size * (efi.memmap.nr_map -
n_removal),
+ .size = efi.memmap.desc_size *
(efi.memmap.nr_map - n_removal),
.flags = 0,
};
On Sat, Feb 1, 2020 at 1:45 AM Ingo Molnar <[email protected]> wrote:
>
>
>
> Jörg Otte wrote:
>
> > It's bisected.
> > The first bad commit is :
> > 1db91035d01aa8bfa2350c00ccb63d629b4041ad
> > efi: Add tracking for dynamically allocated memmaps
>
> > Unfortunately I can not revert because of compile errors!
> >
> > In file included from /media/jojo/deftoshiba/kernel/linux/init/main.c:48:
> > /media/jojo/deftoshiba/kernel/linux/include/linux/efi.h:975:1: error:
> > version control conflict marker in file
> > <<<<<<< HEAD
>
> So 1db91035d0 doesn't revert cleanly, because 484a418d0754 depends on it,
> plus there a third commit (f0ef6523475f) that has a semantic dependency
> on 1db91035d01a.
>
> But you can revert them all, if done in reverse chronological order:
>
> git revert f0ef6523475f # ("efi: Fix efi_memmap_alloc() leaks")
> git revert 484a418d0754 # ("efi: Fix handling of multiple efi_fake_mem= entries")
> git revert 1db91035d01a # ("efi: Add tracking for dynamically allocated memmaps")
>
> You can paste those three lines into a shell as-is, or you can apply the
> patch below which is a combination of these three reverts.
>
> Jörg, can you confirm that doing these reverts fixes booting on your
> system? If it does then a full dmesg from that kernel would be useful.
>
> FWIW I reviewed the bisected commit and didn't find the bug but I also
> couldn't convince myself it's a 1:1 identity transformation and cleanup
> of the existing logic.
>
> The size arithmethics transformation looks correct at first sight, but
> the data->flags handling in particular looks rather interwoven.
Agreed, but the only flags change that I couldn't convince myself was
correct is this:
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 59f7f6d60cf6..314b36ac2a08 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -309,7 +309,7 @@ static void __init efi_clean_memmap(void)
.desc_version = efi.memmap.desc_version,
.desc_size = efi.memmap.desc_size,
.size = data.desc_size * (efi.memmap.nr_map -
n_removal),
- .flags = 0,
+ .flags = efi.memmap.flags & EFI_MEMMAP_LATE,
};
pr_warn("Removing %d invalid memory map entries.\n", n_removal);
...but efi_clean_memmap() should "early".
On Sat, Feb 1, 2020 at 7:35 AM Jörg Otte <[email protected]> wrote:
[..]
> > I'll take a look. Jörg, can you paste a full dmesg from a good boot?
>
> Here it is.
Much appreciated, I found an old Haswell laptop and am able to
reproduce the boot hang.
On Fri, Jan 31, 2020 at 10:45 AM Ard Biesheuvel
<[email protected]> wrote:
>
> earlycon=efifb may help to get better diagnostics, but you will need to use a camera to capture the output
https://photos.app.goo.gl/uA3Wkaxc8x6A4gK47
On Sat, 1 Feb 2020 at 22:44, Dan Williams <[email protected]> wrote:
>
> On Fri, Jan 31, 2020 at 10:45 AM Ard Biesheuvel
> <[email protected]> wrote:
> >
> > earlycon=efifb may help to get better diagnostics, but you will need to use a camera to capture the output
>
> https://photos.app.goo.gl/uA3Wkaxc8x6A4gK47
Yeah, so it definitely has to do with the n_removal > 0 path.
Did you try the change I suggested to Joerg?
On Sat, Feb 1, 2020 at 1:49 PM Ard Biesheuvel <[email protected]> wrote:
>
> On Sat, 1 Feb 2020 at 22:44, Dan Williams <[email protected]> wrote:
> >
> > On Fri, Jan 31, 2020 at 10:45 AM Ard Biesheuvel
> > <[email protected]> wrote:
> > >
> > > earlycon=efifb may help to get better diagnostics, but you will need to use a camera to capture the output
> >
> > https://photos.app.goo.gl/uA3Wkaxc8x6A4gK47
>
> Yeah, so it definitely has to do with the n_removal > 0 path.
>
> Did you try the change I suggested to Joerg?
Nice, it worked.
Tested-by: Dan Williams <[email protected]>
On Sat, 1 Feb 2020 at 23:31, Dan Williams <[email protected]> wrote:
>
> On Sat, Feb 1, 2020 at 1:49 PM Ard Biesheuvel <[email protected]> wrote:
> >
> > On Sat, 1 Feb 2020 at 22:44, Dan Williams <[email protected]> wrote:
> > >
> > > On Fri, Jan 31, 2020 at 10:45 AM Ard Biesheuvel
> > > <[email protected]> wrote:
> > > >
> > > > earlycon=efifb may help to get better diagnostics, but you will need to use a camera to capture the output
> > >
> > > https://photos.app.goo.gl/uA3Wkaxc8x6A4gK47
> >
> > Yeah, so it definitely has to do with the n_removal > 0 path.
> >
> > Did you try the change I suggested to Joerg?
>
> Nice, it worked.
>
> Tested-by: Dan Williams <[email protected]>
Thanks for confirming. I'll spin a proper patch.
* Ard Biesheuvel <[email protected]> wrote:
> Hello J?rg,
>
> Could you please try whether the change below fixes the issue?
>
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 59f7f6d60cf6..ae923ee8e2b4 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -308,7 +308,7 @@ static void __init efi_clean_memmap(void)
> .phys_map = efi.memmap.phys_map,
> .desc_version = efi.memmap.desc_version,
> .desc_size = efi.memmap.desc_size,
> - .size = data.desc_size * (efi.memmap.nr_map - n_removal),
> + .size = efi.memmap.desc_size * (efi.memmap.nr_map - n_removal),
> .flags = 0,
Oh, I actually noticed this one, but convinced myself that it's correct,
because GCC didn't warn about uninitialized data.
But maybe in this weird case data.desc_size as used within its own
initializer is zero?
Thanks,
Ingo
On Sun, 2 Feb 2020 at 10:22, Ingo Molnar <[email protected]> wrote:
>
>
> * Ard Biesheuvel <[email protected]> wrote:
>
> > Hello Jörg,
> >
> > Could you please try whether the change below fixes the issue?
> >
> > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > index 59f7f6d60cf6..ae923ee8e2b4 100644
> > --- a/arch/x86/platform/efi/efi.c
> > +++ b/arch/x86/platform/efi/efi.c
> > @@ -308,7 +308,7 @@ static void __init efi_clean_memmap(void)
> > .phys_map = efi.memmap.phys_map,
> > .desc_version = efi.memmap.desc_version,
> > .desc_size = efi.memmap.desc_size,
> > - .size = data.desc_size * (efi.memmap.nr_map - n_removal),
> > + .size = efi.memmap.desc_size * (efi.memmap.nr_map - n_removal),
> > .flags = 0,
>
> Oh, I actually noticed this one, but convinced myself that it's correct,
> because GCC didn't warn about uninitialized data.
>
> But maybe in this weird case data.desc_size as used within its own
> initializer is zero?
>
Something like that, yes. Note that size and desc_size appear in
opposite order in the struct definition, and this may also affect how
the compiler handles this.
Am So., 2. Feb. 2020 um 10:22 Uhr schrieb Ingo Molnar <[email protected]>:
>
>
> * Ard Biesheuvel <[email protected]> wrote:
>
> > Hello Jörg,
> >
> > Could you please try whether the change below fixes the issue?
> >
> > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > index 59f7f6d60cf6..ae923ee8e2b4 100644
> > --- a/arch/x86/platform/efi/efi.c
> > +++ b/arch/x86/platform/efi/efi.c
> > @@ -308,7 +308,7 @@ static void __init efi_clean_memmap(void)
> > .phys_map = efi.memmap.phys_map,
> > .desc_version = efi.memmap.desc_version,
> > .desc_size = efi.memmap.desc_size,
> > - .size = data.desc_size * (efi.memmap.nr_map - n_removal),
> > + .size = efi.memmap.desc_size * (efi.memmap.nr_map - n_removal),
> > .flags = 0,
>
> Oh, I actually noticed this one, but convinced myself that it's correct,
> because GCC didn't warn about uninitialized data.
>
> But maybe in this weird case data.desc_size as used within its own
> initializer is zero?
>
Patch makes my kernel booting again :)
Thanks, Jörg
* J?rg Otte <[email protected]> wrote:
> Am So., 2. Feb. 2020 um 10:22 Uhr schrieb Ingo Molnar <[email protected]>:
> >
> >
> > * Ard Biesheuvel <[email protected]> wrote:
> >
> > > Hello J?rg,
> > >
> > > Could you please try whether the change below fixes the issue?
> > >
> > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > > index 59f7f6d60cf6..ae923ee8e2b4 100644
> > > --- a/arch/x86/platform/efi/efi.c
> > > +++ b/arch/x86/platform/efi/efi.c
> > > @@ -308,7 +308,7 @@ static void __init efi_clean_memmap(void)
> > > .phys_map = efi.memmap.phys_map,
> > > .desc_version = efi.memmap.desc_version,
> > > .desc_size = efi.memmap.desc_size,
> > > - .size = data.desc_size * (efi.memmap.nr_map - n_removal),
> > > + .size = efi.memmap.desc_size * (efi.memmap.nr_map - n_removal),
> > > .flags = 0,
> >
> > Oh, I actually noticed this one, but convinced myself that it's correct,
> > because GCC didn't warn about uninitialized data.
> >
> > But maybe in this weird case data.desc_size as used within its own
> > initializer is zero?
>
> Patch makes my kernel booting again :)
Thank you! I'll send the fix to Linus later today.
Ingo