2023-11-22 00:07:58

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: Kernel 6.6.1 hangs on "loading initial ramdisk"

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> After upgrading from 6.5.9 to 6.6.1 on my Dell Latitude E6420 (Intel i5-2520M) with EndeavourOS, the boot process would hang at "loading initial ramdisk". The issue is present on the 6.6.1 release of both Linux and Linux-zen, but not the 6.5.9 release, which makes me think this is somehow upstream in the kernel, rather than to do with packaging. My current workaround is using the Linux LTS kernel.
>
> I have been unable to consistently reproduce this bug. Between 50 and 30 percent of the time, the "loading initial ramdisk" will display, the disk activity indicator will turn off briefly and then resume blinking, and then the kernel boots as expected. The other 50 to 70 percent of the time, the boot stops at "loading initial ramdisk" and the disk activity indicator turns off, and does not resume blinking. The disk activity light is constantly flashing during normal system operation, so I know it's not secretly booting but not updating the display. I haven't been able to replicate this issue in QEMU. I have seen similar bugs that have been solved by disabling IOMMU, but this has not had any effect. Neither has disabling graphics drivers and modesetting. I have been able to reproduce it while using Nouveau, so I don't believe it has to do with Nvidia's proprietary drivers.
>
> Examining dmesg and journalctl, there doesn't appear to be ANY logs from the failed boots. I don't believe the kernel even is started on these failed boots. Enabling GRUB debug messages (linux,loader,init,fs,device,disk,partition) shows that the hang occurs after GRUB attempts to start the loaded image- it's able to load the image into memory, but the boot stalls after "Starting image" with a hex address (presumably the start addr of the kernel).
>
> I've been trying to compile the kernel myself to see if I can solve the issue, or at least aid in reproduceability, but this is not easy or fast to do on a 2012 i5 processor. I'll update if I can successfully recompile the kernel and if it yields any information.
>
> Please let me know if I should provide any additional information. This is my first time filing a bug here.

See Bugzilla for the full thread and attached grub output.

Anyway, I'm adding this regression to regzbot:

#regzbot introduced: v6.5..v6.6 https://bugzilla.kernel.org/show_bug.cgi?id=218173
#regzbot title: initramfs loading hang on nouveau system (Dell Latitude E6420)

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218173

--
An old man doll... just what I always wanted! - Clara


2023-12-10 06:41:16

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: Kernel 6.6.1 hangs on "loading initial ramdisk"

On Wed, Nov 22, 2023 at 07:06:50AM +0700, Bagas Sanjaya wrote:
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > After upgrading from 6.5.9 to 6.6.1 on my Dell Latitude E6420 (Intel i5-2520M) with EndeavourOS, the boot process would hang at "loading initial ramdisk". The issue is present on the 6.6.1 release of both Linux and Linux-zen, but not the 6.5.9 release, which makes me think this is somehow upstream in the kernel, rather than to do with packaging. My current workaround is using the Linux LTS kernel.
> >
> > I have been unable to consistently reproduce this bug. Between 50 and 30 percent of the time, the "loading initial ramdisk" will display, the disk activity indicator will turn off briefly and then resume blinking, and then the kernel boots as expected. The other 50 to 70 percent of the time, the boot stops at "loading initial ramdisk" and the disk activity indicator turns off, and does not resume blinking. The disk activity light is constantly flashing during normal system operation, so I know it's not secretly booting but not updating the display. I haven't been able to replicate this issue in QEMU. I have seen similar bugs that have been solved by disabling IOMMU, but this has not had any effect. Neither has disabling graphics drivers and modesetting. I have been able to reproduce it while using Nouveau, so I don't believe it has to do with Nvidia's proprietary drivers.
> >
> > Examining dmesg and journalctl, there doesn't appear to be ANY logs from the failed boots. I don't believe the kernel even is started on these failed boots. Enabling GRUB debug messages (linux,loader,init,fs,device,disk,partition) shows that the hang occurs after GRUB attempts to start the loaded image- it's able to load the image into memory, but the boot stalls after "Starting image" with a hex address (presumably the start addr of the kernel).
> >
> > I've been trying to compile the kernel myself to see if I can solve the issue, or at least aid in reproduceability, but this is not easy or fast to do on a 2012 i5 processor. I'll update if I can successfully recompile the kernel and if it yields any information.
> >
> > Please let me know if I should provide any additional information. This is my first time filing a bug here.
>
> See Bugzilla for the full thread and attached grub output.
>
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot introduced: v6.5..v6.6 https://bugzilla.kernel.org/show_bug.cgi?id=218173
> #regzbot title: initramfs loading hang on nouveau system (Dell Latitude E6420)
>

Another reporter on Bugzilla had bisected the regression, so:

#regzbot introduced: a1b87d54f4e45f

Thanks.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (2.71 kB)
signature.asc (235.00 B)
Download all attachments

2023-12-10 07:18:43

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Fwd: Kernel 6.6.1 hangs on "loading initial ramdisk"

[Moved a lot of people CCed in the previous mail to BCC, as I'm pretty
sure they do not care about this regression; at the same time add the
x86 maintainers and the efi list.]

[Top posting for once to make this easier accessible for everyone.]

Ard, Boris, just to make it obvious: the regression report quoted below
was bisected to a1b87d54f4e45f ("x86/efistub: Avoid legacy decompressor
when doing EFI boot") [v6.6-rc1] from Ard which committed by Boris.
There are two users that seem to be affected by this. Both seem to run
Arch. For details see:
https://bugzilla.kernel.org/show_bug.cgi?id=218173

Bagas, FWIW, I know you want to help, but your previous mail is not
helpful at all -- on the contrary, as it is yet another one that is
likely hurting my regression tracking efforts[1]. Please stop and just
tell me about things like this in a private mail, as we agreed on earlier.

Ciao, Thorsten

[1] This is why: You just added Ard and Boris to the CC, but did not
make it obvious *why* they should care about that mail. They (and all
the other recipients) for sure will have no idea what a1b87d54f4e45f
exactly is, so you should have mentioned the commit summary. And doing
that after a big quote makes it worse, as many people now need to scroll
down to see if that mails contains something that might be relevant for
them -- and just a waste of time if not.

Furthermore, sending the first mail of the thread to all those people
and lists was likely not very wise, as nobody is likely to care in a
case like this. And not removing all those people and lists in the
second mail of the thread make it a lot worse, as it became clear that
many people and list do not care about it now that the regression was
bisected. Hence it's best to remove them, we all get enough mail already.

All that makes people ignore mails from you -- and maybe about
regression tracking in general. :-(

On 10.12.23 07:40, Bagas Sanjaya wrote:
> On Wed, Nov 22, 2023 at 07:06:50AM +0700, Bagas Sanjaya wrote:
>> Hi,
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>
>>> After upgrading from 6.5.9 to 6.6.1 on my Dell Latitude E6420 (Intel i5-2520M) with EndeavourOS, the boot process would hang at "loading initial ramdisk". The issue is present on the 6.6.1 release of both Linux and Linux-zen, but not the 6.5.9 release, which makes me think this is somehow upstream in the kernel, rather than to do with packaging. My current workaround is using the Linux LTS kernel.
>>>
>>> I have been unable to consistently reproduce this bug. Between 50 and 30 percent of the time, the "loading initial ramdisk" will display, the disk activity indicator will turn off briefly and then resume blinking, and then the kernel boots as expected. The other 50 to 70 percent of the time, the boot stops at "loading initial ramdisk" and the disk activity indicator turns off, and does not resume blinking. The disk activity light is constantly flashing during normal system operation, so I know it's not secretly booting but not updating the display. I haven't been able to replicate this issue in QEMU. I have seen similar bugs that have been solved by disabling IOMMU, but this has not had any effect. Neither has disabling graphics drivers and modesetting. I have been able to reproduce it while using Nouveau, so I don't believe it has to do with Nvidia's proprietary drivers.
>>>
>>> Examining dmesg and journalctl, there doesn't appear to be ANY logs from the failed boots. I don't believe the kernel even is started on these failed boots. Enabling GRUB debug messages (linux,loader,init,fs,device,disk,partition) shows that the hang occurs after GRUB attempts to start the loaded image- it's able to load the image into memory, but the boot stalls after "Starting image" with a hex address (presumably the start addr of the kernel).
>>>
>>> I've been trying to compile the kernel myself to see if I can solve the issue, or at least aid in reproduceability, but this is not easy or fast to do on a 2012 i5 processor. I'll update if I can successfully recompile the kernel and if it yields any information.
>>>
>>> Please let me know if I should provide any additional information. This is my first time filing a bug here.
>>
>> See Bugzilla for the full thread and attached grub output.
>>
>> Anyway, I'm adding this regression to regzbot:
>>
>> #regzbot introduced: v6.5..v6.6 https://bugzilla.kernel.org/show_bug.cgi?id=218173
>> #regzbot title: initramfs loading hang on nouveau system (Dell Latitude E6420)
>>
>
> Another reporter on Bugzilla had bisected the regression, so:
>
> #regzbot introduced: a1b87d54f4e45f
>
> Thanks.
>

2023-12-10 07:30:47

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: Kernel 6.6.1 hangs on "loading initial ramdisk"

On 12/10/23 14:15, Linux regression tracking (Thorsten Leemhuis) wrote:
> [Moved a lot of people CCed in the previous mail to BCC, as I'm pretty
> sure they do not care about this regression; at the same time add the
> x86 maintainers and the efi list.]
>
> [Top posting for once to make this easier accessible for everyone.]
>
> Ard, Boris, just to make it obvious: the regression report quoted below
> was bisected to a1b87d54f4e45f ("x86/efistub: Avoid legacy decompressor
> when doing EFI boot") [v6.6-rc1] from Ard which committed by Boris.
> There are two users that seem to be affected by this. Both seem to run
> Arch. For details see:
> https://bugzilla.kernel.org/show_bug.cgi?id=218173
>
> Bagas, FWIW, I know you want to help, but your previous mail is not
> helpful at all -- on the contrary, as it is yet another one that is
> likely hurting my regression tracking efforts[1]. Please stop and just
> tell me about things like this in a private mail, as we agreed on earlier.
>
> Ciao, Thorsten
>
> [1] This is why: You just added Ard and Boris to the CC, but did not
> make it obvious *why* they should care about that mail. They (and all
> the other recipients) for sure will have no idea what a1b87d54f4e45f
> exactly is, so you should have mentioned the commit summary. And doing
> that after a big quote makes it worse, as many people now need to scroll
> down to see if that mails contains something that might be relevant for
> them -- and just a waste of time if not.
>
> Furthermore, sending the first mail of the thread to all those people
> and lists was likely not very wise, as nobody is likely to care in a
> case like this. And not removing all those people and lists in the
> second mail of the thread make it a lot worse, as it became clear that
> many people and list do not care about it now that the regression was
> bisected. Hence it's best to remove them, we all get enough mail already.
>
> All that makes people ignore mails from you -- and maybe about
> regression tracking in general. :-(
>

Oops, I didn't greet additional Cc's as you mentioned (that's my
tendency when handling regressions).

So maybe we continue tracking this on Bugzilla or keeping on ML or
both?

Thanks.

--
An old man doll... just what I always wanted! - Clara

2023-12-10 10:17:38

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Fwd: Kernel 6.6.1 hangs on "loading initial ramdisk"

On 10.12.23 08:30, Bagas Sanjaya wrote:
> On 12/10/23 14:15, Linux regression tracking (Thorsten Leemhuis) wrote:
[...]
>> [1] This is why: You just added Ard and Boris to the CC, but did not
>> make it obvious *why* they should care about that mail. They (and all
>> the other recipients) for sure will have no idea what a1b87d54f4e45f
>> exactly is, so you should have mentioned the commit summary. And doing
>> that after a big quote makes it worse, as many people now need to scroll
>> down to see if that mails contains something that might be relevant for
>> them -- and just a waste of time if not.
>>
>> Furthermore, sending the first mail of the thread to all those people
>> and lists was likely not very wise, as nobody is likely to care in a
>> case like this. And not removing all those people and lists in the
>> second mail of the thread make it a lot worse, as it became clear that
>> many people and list do not care about it now that the regression was
>> bisected. Hence it's best to remove them, we all get enough mail already.
>>
>> All that makes people ignore mails from you -- and maybe about
>> regression tracking in general. :-(
>
> Oops, I didn't greet additional Cc's as you mentioned (that's my
> tendency when handling regressions).

Well, yes, but as mentioned that is just one of several things that
were slightly off.

> So maybe we continue tracking this on Bugzilla or keeping on ML or
> both?

Not sure what you mean. I'll reply in private, no need to bother the
others with even more mail.

BTW: Ard, I noticed you got involved in the ticket. Thx for that!

Ciao, Thorsten