Hi,
I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
The symptom is a reboot before the video console is available.
I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
services forever". However, that commit is known to be broken. The
proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
During the bisect some of the stopping points landed on commits that
caused the boot process to hang rather than cause a reboot. The
commits that resulted in a hang are marked "git bisect skip" in this
log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
I'll try treating those hangs as bad bisect results and re-run the
full bisect tomorrow. In the meantime I wonder if the bisect log
implicates a better regression candidate?
* Dan Williams <[email protected]> wrote:
> Hi,
>
> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
>
> The symptom is a reboot before the video console is available.
>
> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> services forever". However, that commit is known to be broken. The
> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>
> During the bisect some of the stopping points landed on commits that
> caused the boot process to hang rather than cause a reboot. The
> commits that resulted in a hang are marked "git bisect skip" in this
> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>
> I'll try treating those hangs as bad bisect results and re-run the
> full bisect tomorrow. In the meantime I wonder if the bisect log
> implicates a better regression candidate?
You could also try reverts of the suspicious commits, and then, if the reverted
kernel works fine, create a more linear history by cherry-picking them in the
right order - and then be able to pinpoint the bad commit with a higher
confidence.
Thanks,
Ingo
On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
> Hi,
>
> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
>
> The symptom is a reboot before the video console is available.
>
> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> services forever". However, that commit is known to be broken. The
> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>
> During the bisect some of the stopping points landed on commits that
> caused the boot process to hang rather than cause a reboot. The
> commits that resulted in a hang are marked "git bisect skip" in this
> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>
> I'll try treating those hangs as bad bisect results and re-run the
> full bisect tomorrow. In the meantime I wonder if the bisect log
> implicates a better regression candidate?
Could you mail the dmesg output when booting a known working kernel
with efi=debug ?
On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming <[email protected]> wrote:
> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
>> Hi,
>>
>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
>>
>> The symptom is a reboot before the video console is available.
>>
>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
>> services forever". However, that commit is known to be broken. The
>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>>
>> During the bisect some of the stopping points landed on commits that
>> caused the boot process to hang rather than cause a reboot. The
>> commits that resulted in a hang are marked "git bisect skip" in this
>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>>
>> I'll try treating those hangs as bad bisect results and re-run the
>> full bisect tomorrow. In the meantime I wonder if the bisect log
>> implicates a better regression candidate?
>
> Could you mail the dmesg output when booting a known working kernel
> with efi=debug ?
Here it is:
https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2
On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams <[email protected]> wrote:
> On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming <[email protected]> wrote:
>> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
>>> Hi,
>>>
>>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
>>>
>>> The symptom is a reboot before the video console is available.
>>>
>>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
>>> services forever". However, that commit is known to be broken. The
>>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
>>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>>>
>>> During the bisect some of the stopping points landed on commits that
>>> caused the boot process to hang rather than cause a reboot. The
>>> commits that resulted in a hang are marked "git bisect skip" in this
>>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>>>
>>> I'll try treating those hangs as bad bisect results and re-run the
>>> full bisect tomorrow. In the meantime I wonder if the bisect log
>>> implicates a better regression candidate?
>>
>> Could you mail the dmesg output when booting a known working kernel
>> with efi=debug ?
>
> Here it is:
>
> https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2
I am able to build a kernel and boot the platform with the following
set of reverts:
Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
Revert "efi: Allow drivers to reserve boot services forever"
* Dan Williams <[email protected]> wrote:
> On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams <[email protected]> wrote:
> > On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming <[email protected]> wrote:
> >> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
> >>> Hi,
> >>>
> >>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
> >>>
> >>> The symptom is a reboot before the video console is available.
> >>>
> >>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> >>> services forever". However, that commit is known to be broken. The
> >>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> >>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
> >>>
> >>> During the bisect some of the stopping points landed on commits that
> >>> caused the boot process to hang rather than cause a reboot. The
> >>> commits that resulted in a hang are marked "git bisect skip" in this
> >>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
> >>>
> >>> I'll try treating those hangs as bad bisect results and re-run the
> >>> full bisect tomorrow. In the meantime I wonder if the bisect log
> >>> implicates a better regression candidate?
> >>
> >> Could you mail the dmesg output when booting a known working kernel
> >> with efi=debug ?
> >
> > Here it is:
> >
> > https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2
>
> I am able to build a kernel and boot the platform with the following
> set of reverts:
>
> Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
> Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
> Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
> Revert "efi: Allow drivers to reserve boot services forever"
Could you please describe the bootup behavior after each revert? I.e. wild guess:
vanilla kernel:
# spontaneous reboot
+ Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE":
# spontaneous reboot
+ Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data":
# hang
+ Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()":
# hang
+ Revert "efi: Allow drivers to reserve boot services forever":
== works
?
Thanks,
Ingo
On Thu, 20 Oct, at 12:37:16PM, Dan Williams wrote:
>
> I am able to build a kernel and boot the platform with the following
> set of reverts:
>
> Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
> Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
> Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
> Revert "efi: Allow drivers to reserve boot services forever"
FYI, I've been able to reproduce some crash when using your EFI memory
map layout under Qemu and forcing the ESRT driver to reserve the space.
It looks like the new EFI memmap we allocate as part of the
reservation is smaller than the old one - which is backwards.
Still debugging...
On Fri, Oct 21, 2016 at 12:00 AM, Ingo Molnar <[email protected]> wrote:
>
> * Dan Williams <[email protected]> wrote:
>
>> On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams <[email protected]> wrote:
>> > On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming <[email protected]> wrote:
>> >> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
>> >>> Hi,
>> >>>
>> >>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
>> >>>
>> >>> The symptom is a reboot before the video console is available.
>> >>>
>> >>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
>> >>> services forever". However, that commit is known to be broken. The
>> >>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
>> >>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>> >>>
>> >>> During the bisect some of the stopping points landed on commits that
>> >>> caused the boot process to hang rather than cause a reboot. The
>> >>> commits that resulted in a hang are marked "git bisect skip" in this
>> >>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>> >>>
>> >>> I'll try treating those hangs as bad bisect results and re-run the
>> >>> full bisect tomorrow. In the meantime I wonder if the bisect log
>> >>> implicates a better regression candidate?
>> >>
>> >> Could you mail the dmesg output when booting a known working kernel
>> >> with efi=debug ?
>> >
>> > Here it is:
>> >
>> > https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2
>>
>> I am able to build a kernel and boot the platform with the following
>> set of reverts:
>>
>> Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
>> Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
>> Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
>> Revert "efi: Allow drivers to reserve boot services forever"
>
> Could you please describe the bootup behavior after each revert? I.e. wild guess:
>
> vanilla kernel:
> # spontaneous reboot
> + Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE":
> # spontaneous reboot
> + Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data":
> # hang
> + Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()":
> # hang
> + Revert "efi: Allow drivers to reserve boot services forever":
> == works
>
> ?
In this case all but the last revert produce the same result, instant
reboot after loading the kernel. I have not been able to pinpoint
what changes that behavior to the hang conditions I saw mid-bisect.
The first three reverts are just there to get the kernel to build
again after reverting "efi: Allow drivers to reserve boot services
forever"
On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
>
> FYI, I've been able to reproduce some crash when using your EFI memory
> map layout under Qemu and forcing the ESRT driver to reserve the space.
Nope, that was a bug in my hack. I can't get Qemu to crash while using
your memory map layout.
Any chance you can insert "while(1)" loops into the EFI boot paths for
a kernel that is known to reboot or trigger a triple fault in kernels
that hang, so that we can narrow in on the issue. See,
http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html
On Fri, Oct 21, 2016 at 1:20 PM, Matt Fleming <[email protected]> wrote:
> On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
>>
>> FYI, I've been able to reproduce some crash when using your EFI memory
>> map layout under Qemu and forcing the ESRT driver to reserve the space.
>
> Nope, that was a bug in my hack. I can't get Qemu to crash while using
> your memory map layout.
>
> Any chance you can insert "while(1)" loops into the EFI boot paths for
> a kernel that is known to reboot or trigger a triple fault in kernels
> that hang, so that we can narrow in on the issue. See,
>
> http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html
I can take a look, but it will not be until Monday when I have
physical access to the system again.
JFYI: I added this report to the list of regressions for Linux 4.9. I'll
watch this thread for further updates on this issue to document progress
in my weekly reports. Please let me know via [email protected]
in case the discussion moves to a different place (bugzilla or another
mail thread for example). tia!
Current status (afaics) in my report: This looks stuck. Or was is
discussed (or even fixed) somewhere else?
Ciao, Thorsten
On 22.10.2016 01:20, Dan Williams wrote:
> On Fri, Oct 21, 2016 at 1:20 PM, Matt Fleming <[email protected]> wrote:
>> On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
>>>
>>> FYI, I've been able to reproduce some crash when using your EFI memory
>>> map layout under Qemu and forcing the ESRT driver to reserve the space.
>>
>> Nope, that was a bug in my hack. I can't get Qemu to crash while using
>> your memory map layout.
>>
>> Any chance you can insert "while(1)" loops into the EFI boot paths for
>> a kernel that is known to reboot or trigger a triple fault in kernels
>> that hang, so that we can narrow in on the issue. See,
>>
>> http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html
>
> I can take a look, but it will not be until Monday when I have
> physical access to the system again.
>
> http://news.gmane.org/find-root.php?message_id=CAPcyv4jkVcBwecxwt1P+p-fMSuen9B9xHEVf0BjM5uJZ4_jAdw%40mail.gmail.com
> http://mid.gmane.org/CAPcyv4jkVcBwecxwt1P+p-fMSuen9B9xHEVf0BjM5uJZ4_jAdw%40mail.gmail.com
>
On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
<[email protected]> wrote:
> JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> watch this thread for further updates on this issue to document progress
> in my weekly reports. Please let me know via [email protected]
> in case the discussion moves to a different place (bugzilla or another
> mail thread for example). tia!
>
> Current status (afaics) in my report: This looks stuck. Or was is
> discussed (or even fixed) somewhere else?
Thanks, and no, not fixed yet. I've not found the time to run the
experiments Matt needs, but a colleague has offered to look into it.
On Sun, 30 Oct, at 08:59:58AM, Dan Williams wrote:
> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
> <[email protected]> wrote:
> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> > watch this thread for further updates on this issue to document progress
> > in my weekly reports. Please let me know via [email protected]
> > in case the discussion moves to a different place (bugzilla or another
> > mail thread for example). tia!
> >
> > Current status (afaics) in my report: This looks stuck. Or was is
> > discussed (or even fixed) somewhere else?
>
> Thanks, and no, not fixed yet. I've not found the time to run the
> experiments Matt needs, but a colleague has offered to look into it.
Of course, if you are willing to help with debugging, Thorsten, it
would be much appreciated and this bug might get fixed sooner.
On Sun, 2016-10-30 at 08:59 -0700, Dan Williams wrote:
> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
> <[email protected]> wrote:
> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> > watch this thread for further updates on this issue to document progress
> > in my weekly reports. Please let me know via [email protected]
> > in case the discussion moves to a different place (bugzilla or another
> > mail thread for example). tia!
> >
> > Current status (afaics) in my report: This looks stuck. Or was is
> > discussed (or even fixed) somewhere else?
>
> Thanks, and no, not fixed yet. I've not found the time to run the
> experiments Matt needs, but a colleague has offered to look into it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Dan, was there a special configuration that you enabled for this bug?
The person working on the bug can't reproduce the bug in v4.9-rc1 or
v4.9-rc3.
On Wed, Nov 2, 2016 at 5:41 PM, Neri, Ricardo <[email protected]> wrote:
> On Sun, 2016-10-30 at 08:59 -0700, Dan Williams wrote:
>> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
>> <[email protected]> wrote:
>> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
>> > watch this thread for further updates on this issue to document progress
>> > in my weekly reports. Please let me know via [email protected]
>> > in case the discussion moves to a different place (bugzilla or another
>> > mail thread for example). tia!
>> >
>> > Current status (afaics) in my report: This looks stuck. Or was is
>> > discussed (or even fixed) somewhere else?
>>
>> Thanks, and no, not fixed yet. I've not found the time to run the
>> experiments Matt needs, but a colleague has offered to look into it.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> Dan, was there a special configuration that you enabled for this bug?
> The person working on the bug can't reproduce the bug in v4.9-rc1 or
> v4.9-rc3.
I am pxe-booting the platform to an nfs root filesystem.