2007-02-23 13:57:59

by Andrew

[permalink] [raw]
Subject: 2.6.21-rc1: framebuffer/console boot failure

Hi,

2.6.21-rc1 fails to boot on my machine. As soon as I switch
from grub the screen turns and remains black with no sign
of Tux or any output.

I've run a git-bisect between 2.6.20 (which works fine) and
2.6.21-rc1 and found the first bad commit to be
#59b8175c771040afcd4ad67022b0cc80c216b866 which seems bizarre
to me since this is a mostly an ARM commit.

I haven't tried to revert this commit because frankly I don't
know where to go from there with a multi-parent commit.

I'm running on on a Asus A8N-VM CSM motherboard with a single
core AMD64 processor, with a nVidia GPU in the PCI-ex slot
using the standard VESA frame buffer driver.

Relevant config's can be found here:
http://homepage.ntlworld.com/anelless/linux/2.6.21-rc1/


2007-02-23 16:05:20

by Andrew

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

I have just discovered 2.6.21-rc1 boots with
pci=noacpi ...

2007-02-24 11:13:25

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

> On Fri, 23 Feb 2007 13:35:50 -0000 (GMT) "Andrew" <[email protected]> wrote:
> Hi,
>
> 2.6.21-rc1 fails to boot on my machine. As soon as I switch
> from grub the screen turns and remains black with no sign
> of Tux or any output.
>
> I've run a git-bisect between 2.6.20 (which works fine) and
> 2.6.21-rc1 and found the first bad commit to be
> #59b8175c771040afcd4ad67022b0cc80c216b866 which seems bizarre
> to me since this is a mostly an ARM commit.
>
> I haven't tried to revert this commit because frankly I don't
> know where to go from there with a multi-parent commit.
>
> I'm running on on a Asus A8N-VM CSM motherboard with a single
> core AMD64 processor, with a nVidia GPU in the PCI-ex slot
> using the standard VESA frame buffer driver.
>
> Relevant config's can be found here:
> http://homepage.ntlworld.com/anelless/linux/2.6.21-rc1/
>

and, later,

> I have just discovered 2.6.21-rc1 boots with
> pci=noacpi ...

Presumably this regression was caused by the ACPI merge. Are you able to
capture the dmesg output from the 2.6.20-rc1 boot? netconsole might be
useful here, thanks.

(You get added to the post-2.6.20 regression list, so you'll be hearing
from us quite a lot for the next month. Sorry ;))

2007-02-24 22:57:41

by Andrew

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

On Sat, February 24, 2007 11:09 am, Andrew Morton wrote:
>> On Fri, 23 Feb 2007 13:35:50 -0000 (GMT) "Andrew" <[email protected]> wrote:
>> Hi,
>>
>>
>> 2.6.21-rc1 fails to boot on my machine. As soon as I switch
>> from grub the screen turns and remains black with no sign of Tux or any output.
>>
>> I've run a git-bisect between 2.6.20 (which works fine) and
>> 2.6.21-rc1 and found the first bad commit to be
>> #59b8175c771040afcd4ad67022b0cc80c216b866 which seems bizarre
>> to me since this is a mostly an ARM commit.
>>
>> I haven't tried to revert this commit because frankly I don't
>> know where to go from there with a multi-parent commit.
>>
>> I'm running on on a Asus A8N-VM CSM motherboard with a single
>> core AMD64 processor, with a nVidia GPU in the PCI-ex slot using the standard VESA frame
>> buffer driver.
>>
>> Relevant config's can be found here:
>> http://homepage.ntlworld.com/anelless/linux/2.6.21-rc1/
>>
>>
>
> and, later,
>
>> I have just discovered 2.6.21-rc1 boots with
>> pci=noacpi ...
>
> Presumably this regression was caused by the ACPI merge. Are you able to
> capture the dmesg output from the 2.6.20-rc1 boot? netconsole might be useful here, thanks.
>
> (You get added to the post-2.6.20 regression list, so you'll be hearing
> from us quite a lot for the next month. Sorry ;))
>


2007-02-24 23:00:13

by Andrew

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

On Sat, February 24, 2007 11:09 am, Andrew Morton wrote:
>
> Presumably this regression was caused by the ACPI merge. Are you able to
> capture the dmesg output from the 2.6.20-rc1 boot? netconsole might be useful here, thanks.
>

I've confirmed a few things:

1) 2.6.21-rc1 actually will boot intermittently.
2) pci=noacpi always allows 2.6.21-rc1 to boot.
2) 2.6.20 always boots.
3) There doesn't seem to be a pattern (that I can tell) between booting and not booting,
although it'll now boot more often than not (It seemed very much t'other way around yesterday)
4) When 2.6.21-rc1 doesn't boot ('Boot'? Am i using the right term here? hmm...) nothing is
sent across netconsole at all.
5) Netconsole is useful.

I've uploaded all the dmesg output i've managed to capture here:
http://homepage.ntlworld.com/anelless/linux/2.6.21-rc1/

> (You get added to the post-2.6.20 regression list, so you'll be hearing
> from us quite a lot for the next month. Sorry ;))
>

Lucky me :)

2007-02-24 23:28:14

by Antonino A. Daplas

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

On Sat, 2007-02-24 at 23:00 +0000, Andrew Nelless wrote:
> On Sat, February 24, 2007 11:09 am, Andrew Morton wrote:
> >
> > Presumably this regression was caused by the ACPI merge. Are you able to
> > capture the dmesg output from the 2.6.20-rc1 boot? netconsole might be useful here, thanks.
> >
>
> I've confirmed a few things:
>
> 1) 2.6.21-rc1 actually will boot intermittently.
> 2) pci=noacpi always allows 2.6.21-rc1 to boot.
> 2) 2.6.20 always boots.
> 3) There doesn't seem to be a pattern (that I can tell) between booting and not booting,
> although it'll now boot more often than not (It seemed very much t'other way around yesterday)
> 4) When 2.6.21-rc1 doesn't boot ('Boot'? Am i using the right term here? hmm...) nothing is
> sent across netconsole at all.
> 5) Netconsole is useful.
>
> I've uploaded all the dmesg output i've managed to capture here:
> http://homepage.ntlworld.com/anelless/linux/2.6.21-rc1/
>
> > (You get added to the post-2.6.20 regression list, so you'll be hearing
> > from us quite a lot for the next month. Sorry ;))
> >
>
> Lucky me :)

How about booting with just vga=normal?

Tony


2007-02-25 11:07:47

by Andrew

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

On Sat, February 24, 2007 11:30 pm, Antonino A. Daplas wrote:
>
> How about booting with just vga=normal?
>
>
> Tony
>

That seems to work too. I've rebooted about 20 times in a row
and it hasn't done it again yet. Why would this only occur
at higher modes?

In the 2.6.20 dmesg log it reads "Nvidia board detected.
Ignoring ACPI timer override." whereas in 2.6.21-rc1 it doesn't.
Could this be the culprit?

Andrew

2007-02-26 12:39:18

by Antonino A. Daplas

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

On Sun, 2007-02-25 at 11:07 +0000, Andrew Nelless wrote:
> On Sat, February 24, 2007 11:30 pm, Antonino A. Daplas wrote:
> >
> > How about booting with just vga=normal?
> >
> >
> > Tony
> >
>
> That seems to work too. I've rebooted about 20 times in a row
> and it hasn't done it again yet. Why would this only occur
> at higher modes?
>
> In the 2.6.20 dmesg log it reads "Nvidia board detected.
> Ignoring ACPI timer override." whereas in 2.6.21-rc1 it doesn't.
> Could this be the culprit?
>

I don't know, probably the ACPI code can now probably detect the
presence or absence of the HPET timer.

Can you remove CONFIG_FB_VESA support from your kernel config but boot
as if you have vesafb (ie with vga=<VESA mode number>). Your machine may
boot to completion but you will have a blank screen. But you should be
able to have an output in netconsole and you can start X. I wanted to
know if the lockup is related to the framebuffer.

Tony


2007-02-26 18:49:01

by Andrew

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

On Mon, February 26, 2007 12:41 pm, Antonino A. Daplas wrote:
>
> I don't know, probably the ACPI code can now probably detect the
> presence or absence of the HPET timer.
>
> Can you remove CONFIG_FB_VESA support from your kernel config but boot
> as if you have vesafb (ie with vga=<VESA mode number>). Your machine may boot to completion but
> you will have a blank screen. But you should be able to have an output in netconsole and you
> can start X. I wanted to know if the lockup is related to the framebuffer.
>
> Tony
>
>

I disabled CONFIG_FB_VESA but it is still happening because
intermittent boots don't pump out anything over NetConsole.

It's tempting to think this is a hardware issue but I've been
booting 2.6.20 daily since -rc3 and this hasn't happened before
and still doesn't. I've even installed Asus's latest "beta bios"
(which convenient doesn't come with a changelog) but it had
no effect.

The only thing I can think to do now is another git bisect.
Now I know this occurs on average about every third boot I
could do half a dozen reboots between bisections and hopefully
find out what caused the problem..

Unfortunately I won't have the time for such a time consuming
adventure much the weekend..

Any further ideas?

-- Andrew

P.S. You mentioned HPET, is this HPET config normal?
andrew@ziggy ~ $ fgrep -i hpet /usr/src/linux-2.6.21-rc1/.config
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_HPET is not set


2007-02-26 23:07:05

by Antonino A. Daplas

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

On Mon, 2007-02-26 at 18:48 +0000, Andrew Nelless wrote:
> On Mon, February 26, 2007 12:41 pm, Antonino A. Daplas wrote:
> >
> > I don't know, probably the ACPI code can now probably detect the
> > presence or absence of the HPET timer.
> >
> > Can you remove CONFIG_FB_VESA support from your kernel config but boot
> > as if you have vesafb (ie with vga=<VESA mode number>). Your machine may boot to completion but
> > you will have a blank screen. But you should be able to have an output in netconsole and you
> > can start X. I wanted to know if the lockup is related to the framebuffer.
> >
> > Tony
> >
> >
>
> I disabled CONFIG_FB_VESA but it is still happening because
> intermittent boots don't pump out anything over NetConsole.
>

Okay, which rules out console code. The vga= parameter is processed at
the very start, specifically in arch/x86_64/boot/video.S. This is
probably not mixing very well with the rest of the code.

> It's tempting to think this is a hardware issue but I've been
> booting 2.6.20 daily since -rc3 and this hasn't happened before
> and still doesn't. I've even installed Asus's latest "beta bios"
> (which convenient doesn't come with a changelog) but it had
> no effect.

If your machine was broken right from the beginning, I would say that
this is also a hardware issue, but, no, it's a regression.

>
> The only thing I can think to do now is another git bisect.
> Now I know this occurs on average about every third boot I
> could do half a dozen reboots between bisections and hopefully
> find out what caused the problem..
>
> Unfortunately I won't have the time for such a time consuming
> adventure much the weekend..
>
> Any further ideas?
>
> -- Andrew
>
> P.S. You mentioned HPET, is this HPET config normal?
> andrew@ziggy ~ $ fgrep -i hpet /usr/src/linux-2.6.21-rc1/.config
> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> # CONFIG_HPET is not set

Not sure if the timer override workaround for nvidia chipsets is the
culprit, but if you want, you can choose to revert that to the previous
behavior (which is ignoring ACPI timer override). Open
arch/x86_64/kernel/earlyquirk.c:nvidia_bugs() and change this line:

if (acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check))
return;
into this:

acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check);
/* return; */

Tony

2007-02-28 01:34:35

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.6.21-rc1: framebuffer/console boot failure

Andrew wrote:
> I have just discovered 2.6.21-rc1 boots with
> pci=noacpi ...
>
Try setting the resolution and frame rate, video=XXX:1280x1024@70 or
such. Worked for me. I like pci=noacpi, though ;-)

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot