2010-08-03 09:29:08

by Tvrtko Ursulin

[permalink] [raw]
Subject: 2.6.35 hangs on early boot in KVM


I have basically built 2.6.35 with make oldconfig from a working 2.6.34.
Latter works fine in kvm while 2.6.35 hangs very early. I see nothing after
grub (have early printk and verbose bootup enabled), just a blinking VGA
cursor and CPU at 100%.

Config is attached. Any ideas what options I could toggle to debug this? I
tried gzip instead of lzma but it isn't that.

Thanks,

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.


Attachments:
.config (43.31 kB)

2010-08-03 09:45:10

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 10:28:56 Tvrtko Ursulin wrote:
> I have basically built 2.6.35 with make oldconfig from a working 2.6.34.
> Latter works fine in kvm while 2.6.35 hangs very early. I see nothing after
> grub (have early printk and verbose bootup enabled), just a blinking VGA
> cursor and CPU at 100%.
>
> Config is attached. Any ideas what options I could toggle to debug this? I
> tried gzip instead of lzma but it isn't that.

Just discovered I did not removed the quiet boot option, what I thought I
have.. with that removed I see that it hangs just after "Booting the kernel."
line.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-03 13:53:14

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 10:45:02 Tvrtko Ursulin wrote:
> On Tuesday 03 Aug 2010 10:28:56 Tvrtko Ursulin wrote:
> > I have basically built 2.6.35 with make oldconfig from a working 2.6.34.
> > Latter works fine in kvm while 2.6.35 hangs very early. I see nothing
> > after grub (have early printk and verbose bootup enabled), just a
> > blinking VGA cursor and CPU at 100%.
> >
> > Config is attached. Any ideas what options I could toggle to debug this?
> > I tried gzip instead of lzma but it isn't that.
>
> Just discovered I did not removed the quiet boot option, what I thought I
> have.. with that removed I see that it hangs just after "Booting the
> kernel." line.

Bisection so far:

good: 3de29cab1f8d62db557a4afed0fb17eebfe64438
bad: 537b60d17894b7c19a6060feae40299d7109d6e7

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-03 14:51:13

by Avi Kivity

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> I have basically built 2.6.35 with make oldconfig from a working 2.6.34.
> Latter works fine in kvm while 2.6.35 hangs very early. I see nothing after
> grub (have early printk and verbose bootup enabled), just a blinking VGA
> cursor and CPU at 100%.
>

Please copy [email protected] on kvm issues.

> CONFIG_PRINTK_TIME=y


Try disabling this as a workaround.

--
error compiling committee.c: too many arguments to function

2010-08-03 14:57:10

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> > I have basically built 2.6.35 with make oldconfig from a working 2.6.34.
> > Latter works fine in kvm while 2.6.35 hangs very early. I see nothing
> > after grub (have early printk and verbose bootup enabled), just a
> > blinking VGA cursor and CPU at 100%.
>
> Please copy [email protected] on kvm issues.
>
> > CONFIG_PRINTK_TIME=y
>
> Try disabling this as a workaround.

I am in the middle of a bisect run with five builds left to go, currently I
have:

bad 537b60d17894b7c19a6060feae40299d7109d6e7
good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-03 15:17:27

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> > On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> > > I have basically built 2.6.35 with make oldconfig from a working
> > > 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I see
> > > nothing after grub (have early printk and verbose bootup enabled),
> > > just a blinking VGA cursor and CPU at 100%.
> >
> > Please copy [email protected] on kvm issues.
> >
> > > CONFIG_PRINTK_TIME=y
> >
> > Try disabling this as a workaround.
>
> I am in the middle of a bisect run with five builds left to go, currently I
> have:
>
> bad 537b60d17894b7c19a6060feae40299d7109d6e7
> good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65

Bisect is looking good, narrowed it to ten revisions, but I am not sure to
make it to the end today:

bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
good 41d59102e146a4423a490b8eca68a5860af4fe1c

One interesting waning spotted:

include/config/auto.conf:555:warning: symbol value '-fcall-saved-ecx -fcall-
saved-edx' invalid for ARCH_HWEIGHT_CFLAGS

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-03 15:31:10

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM


On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> > On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> > > On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> > > > I have basically built 2.6.35 with make oldconfig from a working
> > > > 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I see
> > > > nothing after grub (have early printk and verbose bootup enabled),
> > > > just a blinking VGA cursor and CPU at 100%.
> > >
> > > Please copy [email protected] on kvm issues.
> > >
> > > > CONFIG_PRINTK_TIME=y
> > >
> > > Try disabling this as a workaround.
> >
> > I am in the middle of a bisect run with five builds left to go, currently
> > I have:
> >
> > bad 537b60d17894b7c19a6060feae40299d7109d6e7
> > good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
>
> Bisect is looking good, narrowed it to ten revisions, but I am not sure to
> make it to the end today:
>
> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
> good 41d59102e146a4423a490b8eca68a5860af4fe1c
>
> One interesting waning spotted:
>
> include/config/auto.conf:555:warning: symbol value '-fcall-saved-ecx
> -fcall- saved-edx' invalid for ARCH_HWEIGHT_CFLAGS

Copying Peter and Borislav, guys please look at the above warning. I am
bisecting a non-bootable 2.6.35 under KVM and while I am not there yet, it is
close to the hweight commit (cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea) and I
spotted this warning.

Tvrtko


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-03 15:48:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

From: Tvrtko Ursulin <[email protected]>
Date: Tue, Aug 03, 2010 at 11:31:02AM -0400

> > One interesting waning spotted:
> >
> > include/config/auto.conf:555:warning: symbol value '-fcall-saved-ecx
> > -fcall- saved-edx' invalid for ARCH_HWEIGHT_CFLAGS
>
> Copying Peter and Borislav, guys please look at the above warning. I am
> bisecting a non-bootable 2.6.35 under KVM and while I am not there yet, it is
> close to the hweight commit (cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea) and I
> spotted this warning.

That's because you're at a bisection point before the hweight patch but
your .config already contains the ARCH_HWEIGHT_CFLAGS variable because
of the previous bisection point which contained the hweight patch.

I think this can be safely ignored.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2010-08-03 16:00:06

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> > On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> > > On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> > > > I have basically built 2.6.35 with make oldconfig from a working
> > > > 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I see
> > > > nothing after grub (have early printk and verbose bootup enabled),
> > > > just a blinking VGA cursor and CPU at 100%.
> > >
> > > Please copy [email protected] on kvm issues.
> > >
> > > > CONFIG_PRINTK_TIME=y
> > >
> > > Try disabling this as a workaround.
> >
> > I am in the middle of a bisect run with five builds left to go, currently
> > I have:
> >
> > bad 537b60d17894b7c19a6060feae40299d7109d6e7
> > good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
>
> Bisect is looking good, narrowed it to ten revisions, but I am not sure to
> make it to the end today:
>
> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
> good 41d59102e146a4423a490b8eca68a5860af4fe1c

Bisect points the finger to "x86, ioapic: In mpparse use mp_register_ioapic"
(cf7500c0ea133d66f8449d86392d83f840102632), so I am copying Eric. No idea
whether this commit is solely to blame or it is a combined interaction with
KVM, but I am sure you guys will know.

If you want me to test something else please shout.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-03 16:01:12

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 16:49:01 Borislav Petkov wrote:
> From: Tvrtko Ursulin <[email protected]>
> Date: Tue, Aug 03, 2010 at 11:31:02AM -0400
>
> > > One interesting waning spotted:
> > >
> > > include/config/auto.conf:555:warning: symbol value '-fcall-saved-ecx
> > > -fcall- saved-edx' invalid for ARCH_HWEIGHT_CFLAGS
> >
> > Copying Peter and Borislav, guys please look at the above warning. I am
> > bisecting a non-bootable 2.6.35 under KVM and while I am not there yet,
> > it is close to the hweight commit
> > (cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea) and I spotted this warning.
>
> That's because you're at a bisection point before the hweight patch but
> your .config already contains the ARCH_HWEIGHT_CFLAGS variable because
> of the previous bisection point which contained the hweight patch.
>
> I think this can be safely ignored.

Yep, bisect pointed to another commit so I continued another part of this
thread. Thanks for the explanation!

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-03 20:37:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

Tvrtko Ursulin <[email protected]> writes:

> On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
>> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
>> > On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
>> > > On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
>> > > > I have basically built 2.6.35 with make oldconfig from a working
>> > > > 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I see
>> > > > nothing after grub (have early printk and verbose bootup enabled),
>> > > > just a blinking VGA cursor and CPU at 100%.
>> > >
>> > > Please copy [email protected] on kvm issues.
>> > >
>> > > > CONFIG_PRINTK_TIME=y
>> > >
>> > > Try disabling this as a workaround.
>> >
>> > I am in the middle of a bisect run with five builds left to go, currently
>> > I have:
>> >
>> > bad 537b60d17894b7c19a6060feae40299d7109d6e7
>> > good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
>>
>> Bisect is looking good, narrowed it to ten revisions, but I am not sure to
>> make it to the end today:
>>
>> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
>> good 41d59102e146a4423a490b8eca68a5860af4fe1c
>
> Bisect points the finger to "x86, ioapic: In mpparse use mp_register_ioapic"
> (cf7500c0ea133d66f8449d86392d83f840102632), so I am copying Eric. No idea
> whether this commit is solely to blame or it is a combined interaction with
> KVM, but I am sure you guys will know.
>
> If you want me to test something else please shout.

Interesting. This is the second report I have heard of no VGA output
and a hang early in boot, that was bisected to this commit. Since I
could not reproduce it I was hoping it was a fluke with a single piece
of hardware, but it appears not.

There was in fact an off by one bug in that commit, but if that had
been the issue 2.6.35 would have booted ok. There was nothing in that
commit that should have prevented early output, and in fact I can boot
with a very similar configuration. So I am trying to figure out what
pieces are interacting to cause this failure mode to happen.

What version of kvm are you running on your host (in case that matters)?

I want to reproduce this myself so I can start guessing what weird
interactions are going on.

Eric

2010-08-03 20:57:50

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tue, Aug 3, 2010 at 8:59 AM, Tvrtko Ursulin
<[email protected]> wrote:
> On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
>> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
>> > On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
>> > > ? On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
>> > > > I have basically built 2.6.35 with make oldconfig from a working
>> > > > 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I see
>> > > > nothing after grub (have early printk and verbose bootup enabled),
>> > > > just a blinking VGA cursor and CPU at 100%.
>> > >
>> > > Please copy [email protected] on kvm issues.
>> > >
>> > > > CONFIG_PRINTK_TIME=y
>> > >
>> > > Try disabling this as a workaround.
>> >
>> > I am in the middle of a bisect run with five builds left to go, currently
>> > I have:
>> >
>> > bad 537b60d17894b7c19a6060feae40299d7109d6e7
>> > good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
>>
>> Bisect is looking good, narrowed it to ten revisions, but I am not sure to
>> make it to the end today:
>>
>> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
>> good 41d59102e146a4423a490b8eca68a5860af4fe1c
>
> Bisect points the finger to "x86, ioapic: In mpparse use mp_register_ioapic"
> (cf7500c0ea133d66f8449d86392d83f840102632), so I am copying Eric. No idea
> whether this commit is solely to blame or it is a combined interaction with
> KVM, but I am sure you guys will know.
>
> If you want me to test something else please shout.
>

please try attached patch, to see if it help.

Yinghai


Attachments:
smp_mptable_2.patch (2.94 kB)

2010-08-03 22:28:54

by Donald Parsons

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

I wonder if this thread is a similar problem. If so, does setting
CONFIG_SATA_AHCI=y cause booting?

http://lkml.indiana.edu/hypermail/linux/kernel/1008.0/00254.html

Don

2010-08-04 08:09:44

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 21:37:06 Eric W. Biederman wrote:
> Tvrtko Ursulin <[email protected]> writes:
> > On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
> >> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> >> > On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> >> > > On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> >> > > > I have basically built 2.6.35 with make oldconfig from a working
> >> > > > 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I
> >> > > > see nothing after grub (have early printk and verbose bootup
> >> > > > enabled), just a blinking VGA cursor and CPU at 100%.
> >> > >
> >> > > Please copy [email protected] on kvm issues.
> >> > >
> >> > > > CONFIG_PRINTK_TIME=y
> >> > >
> >> > > Try disabling this as a workaround.
> >> >
> >> > I am in the middle of a bisect run with five builds left to go,
> >> > currently I have:
> >> >
> >> > bad 537b60d17894b7c19a6060feae40299d7109d6e7
> >> > good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
> >>
> >> Bisect is looking good, narrowed it to ten revisions, but I am not sure
> >> to make it to the end today:
> >>
> >> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
> >> good 41d59102e146a4423a490b8eca68a5860af4fe1c
> >
> > Bisect points the finger to "x86, ioapic: In mpparse use
> > mp_register_ioapic" (cf7500c0ea133d66f8449d86392d83f840102632), so I am
> > copying Eric. No idea whether this commit is solely to blame or it is a
> > combined interaction with KVM, but I am sure you guys will know.
> >
> > If you want me to test something else please shout.
>
> Interesting. This is the second report I have heard of no VGA output
> and a hang early in boot, that was bisected to this commit. Since I
> could not reproduce it I was hoping it was a fluke with a single piece
> of hardware, but it appears not.
>
> There was in fact an off by one bug in that commit, but if that had
> been the issue 2.6.35 would have booted ok. There was nothing in that
> commit that should have prevented early output, and in fact I can boot
> with a very similar configuration. So I am trying to figure out what
> pieces are interacting to cause this failure mode to happen.
>
> What version of kvm are you running on your host (in case that matters)?
>
> I want to reproduce this myself so I can start guessing what weird
> interactions are going on.

Host is stock openSUSE 11.3 with kvm-0.12.3-2.9.x86_64 and kernel-
desktop-2.6.34-12.3.x86_64. Plus a repeat of my .config if it helps.

Tvrtko



Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.


Attachments:
.config (43.31 kB)

2010-08-04 08:19:08

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Tuesday 03 Aug 2010 21:57:48 Yinghai Lu wrote:
> On Tue, Aug 3, 2010 at 8:59 AM, Tvrtko Ursulin
>
> <[email protected]> wrote:
> > On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
> >> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> >> > On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> >> > > On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> >> > > > I have basically built 2.6.35 with make oldconfig from a working
> >> > > > 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I
> >> > > > see nothing after grub (have early printk and verbose bootup
> >> > > > enabled), just a blinking VGA cursor and CPU at 100%.
> >> > >
> >> > > Please copy [email protected] on kvm issues.
> >> > >
> >> > > > CONFIG_PRINTK_TIME=y
> >> > >
> >> > > Try disabling this as a workaround.
> >> >
> >> > I am in the middle of a bisect run with five builds left to go,
> >> > currently I have:
> >> >
> >> > bad 537b60d17894b7c19a6060feae40299d7109d6e7
> >> > good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
> >>
> >> Bisect is looking good, narrowed it to ten revisions, but I am not sure
> >> to make it to the end today:
> >>
> >> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
> >> good 41d59102e146a4423a490b8eca68a5860af4fe1c
> >
> > Bisect points the finger to "x86, ioapic: In mpparse use
> > mp_register_ioapic" (cf7500c0ea133d66f8449d86392d83f840102632), so I am
> > copying Eric. No idea whether this commit is solely to blame or it is a
> > combined interaction with KVM, but I am sure you guys will know.
> >
> > If you want me to test something else please shout.
>
> please try attached patch, to see if it help.

No luck (no visible difference, no output on VGA or serial console). (Btw
there is a typo in pin_2_irq_leagcy so that you do not push it directly).

Tvrtko




Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-04 09:06:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On 08/04/2010 01:18 AM, Tvrtko Ursulin wrote:
> On Tuesday 03 Aug 2010 21:57:48 Yinghai Lu wrote:
>> On Tue, Aug 3, 2010 at 8:59 AM, Tvrtko Ursulin
>>
>> <[email protected]> wrote:
>>> On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
>>>> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
>>>>> On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
>>>>>> On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
>>>>>>> I have basically built 2.6.35 with make oldconfig from a working
>>>>>>> 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I
>>>>>>> see nothing after grub (have early printk and verbose bootup
>>>>>>> enabled), just a blinking VGA cursor and CPU at 100%.
>>>>>>
>>>>>> Please copy [email protected] on kvm issues.
>>>>>>
>>>>>>> CONFIG_PRINTK_TIME=y
>>>>>>
>>>>>> Try disabling this as a workaround.
>>>>>
>>>>> I am in the middle of a bisect run with five builds left to go,
>>>>> currently I have:
>>>>>
>>>>> bad 537b60d17894b7c19a6060feae40299d7109d6e7
>>>>> good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
>>>>
>>>> Bisect is looking good, narrowed it to ten revisions, but I am not sure
>>>> to make it to the end today:
>>>>
>>>> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
>>>> good 41d59102e146a4423a490b8eca68a5860af4fe1c
>>>
>>> Bisect points the finger to "x86, ioapic: In mpparse use
>>> mp_register_ioapic" (cf7500c0ea133d66f8449d86392d83f840102632), so I am
>>> copying Eric. No idea whether this commit is solely to blame or it is a
>>> combined interaction with KVM, but I am sure you guys will know.
>>>
>>> If you want me to test something else please shout.
>>
>> please try attached patch, to see if it help.
>
> No luck (no visible difference, no output on VGA or serial console). (Btw
> there is a typo in pin_2_irq_leagcy so that you do not push it directly).

can you try current tip with
earlyprintk=ttyS0,115200 or console=uart8250,io,0x3f8,115200?

Thanks

Yinghai

2010-08-04 09:16:15

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Wednesday 04 Aug 2010 10:05:36 Yinghai Lu wrote:
> On 08/04/2010 01:18 AM, Tvrtko Ursulin wrote:
> > On Tuesday 03 Aug 2010 21:57:48 Yinghai Lu wrote:
> >> On Tue, Aug 3, 2010 at 8:59 AM, Tvrtko Ursulin
> >>
> >> <[email protected]> wrote:
> >>> On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
> >>>> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> >>>>> On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> >>>>>> On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> >>>>>>> I have basically built 2.6.35 with make oldconfig from a working
> >>>>>>> 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I
> >>>>>>> see nothing after grub (have early printk and verbose bootup
> >>>>>>> enabled), just a blinking VGA cursor and CPU at 100%.
> >>>>>>
> >>>>>> Please copy [email protected] on kvm issues.
> >>>>>>
> >>>>>>> CONFIG_PRINTK_TIME=y
> >>>>>>
> >>>>>> Try disabling this as a workaround.
> >>>>>
> >>>>> I am in the middle of a bisect run with five builds left to go,
> >>>>> currently I have:
> >>>>>
> >>>>> bad 537b60d17894b7c19a6060feae40299d7109d6e7
> >>>>> good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
> >>>>
> >>>> Bisect is looking good, narrowed it to ten revisions, but I am not
> >>>> sure to make it to the end today:
> >>>>
> >>>> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
> >>>> good 41d59102e146a4423a490b8eca68a5860af4fe1c
> >>>
> >>> Bisect points the finger to "x86, ioapic: In mpparse use
> >>> mp_register_ioapic" (cf7500c0ea133d66f8449d86392d83f840102632), so I am
> >>> copying Eric. No idea whether this commit is solely to blame or it is a
> >>> combined interaction with KVM, but I am sure you guys will know.
> >>>
> >>> If you want me to test something else please shout.
> >>
> >> please try attached patch, to see if it help.
> >
> > No luck (no visible difference, no output on VGA or serial console). (Btw
> > there is a typo in pin_2_irq_leagcy so that you do not push it directly).
>
> can you try current tip with
> earlyprintk=ttyS0,115200 or console=uart8250,io,0x3f8,115200?

Not the tip but 2.6.35 with earlyprintk=ttyS0,115200:

[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.35 (root@kvm-ktest-32) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #4
SMP Wed Aug 4 09:15:10 BST 2010
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[ 0.000000] BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000002bbfd000 (usable)
[ 0.000000] BIOS-e820: 000000002bbfd000 - 000000002bc00000 (reserved)
[ 0.000000] BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
[ 0.000000] bootconsole [earlyser0] enabled
[ 0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
[ 0.000000] DMI 2.4 present.
[ 0.000000] last_pfn = 0x2bbfd max_arch_pfn = 0x100000
[ 0.000000] PAT not supported by CPU.
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] modified physical RAM map:
[ 0.000000] modified: 0000000000000000 - 0000000000001000 (reserved)
[ 0.000000] modified: 0000000000001000 - 0000000000002000 (usable)
[ 0.000000] modified: 0000000000002000 - 0000000000010000 (reserved)
[ 0.000000] modified: 0000000000010000 - 000000000009f400 (usable)
[ 0.000000] modified: 000000000009f400 - 00000000000a0000 (reserved)
[ 0.000000] modified: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] modified: 0000000000100000 - 000000002bbfd000 (usable)
[ 0.000000] modified: 000000002bbfd000 - 000000002bc00000 (reserved)
[ 0.000000] modified: 00000000fffbc000 - 0000000100000000 (reserved)
[ 0.000000] found SMP MP-table at [c00f85c0] f85c0
[ 0.000000] init_memory_mapping: 0000000000000000-000000002bbfd000
[ 0.000000] RAMDISK: 1fa29000 - 20d3e000
[ 0.000000] 699MB LOWMEM available.
[ 0.000000] mapped low ram: 0 - 2bbfd000
[ 0.000000] low ram: 0 - 2bbfd000
[ 0.000000] kvm-clock: Using msrs 12 and 11
[ 0.000000] kvm-clock: cpu 0, msr 0:82a341, boot clock
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000001 -> 0x00001000
[ 0.000000] Normal 0x00001000 -> 0x0002bbfd
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[3] active PFN ranges
[ 0.000000] 0: 0x00000001 -> 0x00000002
[ 0.000000] 0: 0x00000010 -> 0x0000009f
[ 0.000000] 0: 0x00000100 -> 0x0002bbfd
[ 0.000000] Using APIC driver default
[ 0.000000] Intel MultiProcessor Specification v1.4
[ 0.000000] Virtual Wire compatibility mode.
[ 0.000000] MPTABLE: OEM ID: BOCHSCPU
[ 0.000000] MPTABLE: Product ID: 0.1
[ 0.000000] MPTABLE: APIC at: 0xFEE00000
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] I/O APIC #1 Version 17 at 0xFEC00000.
[ 0.000000] BUG: unable to handle kernel paging request at ffffb030
[ 0.000000] IP: [<c011d136>] native_apic_mem_read+0x16/0x20
[ 0.000000] *pde = 00832067 *pte = 00000000
[ 0.000000] Oops: 0000 [#1] SMP
[ 0.000000] last sysfs file:
[ 0.000000] Modules linked in:
[ 0.000000]
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.35 #4 /Bochs
[ 0.000000] EIP: 0060:[<c011d136>] EFLAGS: 00010093 CPU: 0
[ 0.000000] EIP is at native_apic_mem_read+0x16/0x20
[ 0.000000] EAX: ffffb030 EBX: 00000001 ECX: c061f220 EDX: fffff000
[ 0.000000] ESI: 00000001 EDI: 00000000 EBP: c060bde8 ESP: c060bde4
[ 0.000000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 0.000000] Process swapper (pid: 0, ti=c060b000 task=c061f220 task.ti=c060b000)
[ 0.000000] Stack:
[ 0.000000] c011c352 c060be34 c0686380 c0598971 00000000 00000000 c060be1a 00000000
[ 0.000000] <0> 00005000 00000046 c0124548 c01247e8 c060be28 c012a95c 00000005 00000000
[ 0.000000] <0> 00000001 00000000 00000000 00000001 c060be3c c0686502 c060be6c c06865ce
[ 0.000000] Call Trace:
[ 0.000000] [<c011c352>] ? get_physical_broadcast+0x22/0x50
[ 0.000000] [<c0686380>] ? io_apic_get_unique_id+0x78/0x1d1
[ 0.000000] [<c0124548>] ? native_set_pte_at+0x8/0x10
[ 0.000000] [<c01247e8>] ? native_flush_tlb_single+0x8/0x10
[ 0.000000] [<c012a95c>] ? set_pte_vaddr+0x6c/0x80
[ 0.000000] [<c0686502>] ? io_apic_unique_id+0x29/0x2b
[ 0.000000] [<c06865ce>] ? mp_register_ioapic+0xca/0x11a
[ 0.000000] [<c0683bfb>] ? MP_ioapic_info+0x44/0x4a
[ 0.000000] [<c0684158>] ? default_get_smp_config+0x334/0x490
[ 0.000000] [<c068ff6c>] ? free_area_init_nodes+0x30c/0x314
[ 0.000000] [<c03e5599>] ? read_pci_config_16+0x9/0x40
[ 0.000000] [<c067cab7>] ? setup_arch+0xa04/0xa5e
[ 0.000000] [<c0144e15>] ? vprintk+0x375/0x480
[ 0.000000] [<c01758db>] ? trace_hardirqs_off+0xb/0x10
[ 0.000000] [<c048fe0d>] ? _raw_spin_unlock_irqrestore+0x5d/0x70
[ 0.000000] [<c0679714>] ? start_kernel+0xdc/0x378
[ 0.000000] [<c06790d7>] ? i386_start_kernel+0xd7/0xdf
[ 0.000000] Code: 4c 62 c0 8d 84 08 00 c0 ff ff 89 10 5d c3 8d b4 26 00 00 00 00 55 89 e5 e8 64 65 fe
ff 8b 15 cc 4c 62 c0 5d 8d 84 10 00 c0 ff ff <8b> 00 c3 8d b4 26 00 00 00 00 55 89 e5 53 e8 43 65 fe ff
8b 15
[ 0.000000] EIP: [<c011d136>] native_apic_mem_read+0x16/0x20 SS:ESP 0068:c060bde4
[ 0.000000] CR2: 00000000ffffb030
[ 0.000000] ---[ end trace a7919e7f17c0a725 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] Pid: 0, comm: swapper Tainted: G D 2.6.35 #4
[ 0.000000] Call Trace:
[ 0.000000] [<c048c225>] ? printk+0x1d/0x20
[ 0.000000] [<c048c18b>] panic+0x5a/0xd7
[ 0.000000] [<c0148b19>] do_exit+0x7a9/0x840
[ 0.000000] [<c0145059>] ? kmsg_dump+0x139/0x160
[ 0.000000] [<c048c225>] ? printk+0x1d/0x20
[ 0.000000] [<c0106b15>] oops_end+0x95/0xd0
[ 0.000000] [<c0125dc6>] no_context+0xc6/0x160
[ 0.000000] [<c0125f10>] __bad_area_nosemaphore+0xb0/0x160
[ 0.000000] [<c01245e8>] ? _paravirt_ident_32+0x8/0x10
[ 0.000000] [<c0125fd7>] bad_area_nosemaphore+0x17/0x20
[ 0.000000] [<c0126269>] do_page_fault+0xb9/0x410
[ 0.000000] [<c01261b0>] ? do_page_fault+0x0/0x410
[ 0.000000] [<c0490864>] error_code+0x78/0x80
[ 0.000000] [<c017007b>] ? timer_list_show+0x7cb/0xf00
[ 0.000000] [<c011d136>] ? native_apic_mem_read+0x16/0x20
[ 0.000000] [<c011c352>] ? get_physical_broadcast+0x22/0x50
[ 0.000000] [<c0686380>] io_apic_get_unique_id+0x78/0x1d1
[ 0.000000] [<c0124548>] ? native_set_pte_at+0x8/0x10
[ 0.000000] [<c01247e8>] ? native_flush_tlb_single+0x8/0x10
[ 0.000000] [<c012a95c>] ? set_pte_vaddr+0x6c/0x80
[ 0.000000] [<c0686502>] io_apic_unique_id+0x29/0x2b
[ 0.000000] [<c06865ce>] mp_register_ioapic+0xca/0x11a
[ 0.000000] [<c0683bfb>] MP_ioapic_info+0x44/0x4a
[ 0.000000] [<c0684158>] default_get_smp_config+0x334/0x490
[ 0.000000] [<c068ff6c>] ? free_area_init_nodes+0x30c/0x314
[ 0.000000] [<c03e5599>] ? read_pci_config_16+0x9/0x40
[ 0.000000] [<c067cab7>] setup_arch+0xa04/0xa5e
[ 0.000000] [<c0144e15>] ? vprintk+0x375/0x480
[ 0.000000] [<c01758db>] ? trace_hardirqs_off+0xb/0x10
[ 0.000000] [<c048fe0d>] ? _raw_spin_unlock_irqrestore+0x5d/0x70
[ 0.000000] [<c0679714>] start_kernel+0xdc/0x378
[ 0.000000] [<c06790d7>] i386_start_kernel+0xd7/0xdf
[ 0.000000] Unknown interrupt or fault at: 00000246 00000060 c012422a
[ 0.000000] Pid: 0, comm: swapper Tainted: G D 2.6.35 #4
[ 0.000000] Call Trace:
[ 0.000000] [<c048c225>] ? printk+0x1d/0x20
[ 0.000000] [<c04844c1>] ignore_int+0x3d/0x46
[ 0.000000] [<c012422a>] ? native_irq_enable+0xa/0x10
[ 0.000000] [<c048c1f6>] ? panic+0xc5/0xd7
[ 0.000000] [<c048c1f6>] ? panic+0xc5/0xd7
[ 0.000000] [<c012422a>] ? native_irq_enable+0xa/0x10
[ 0.000000] [<c048c1fc>] ? panic+0xcb/0xd7
[ 0.000000] [<c0148b19>] do_exit+0x7a9/0x840
[ 0.000000] [<c0145059>] ? kmsg_dump+0x139/0x160
[ 0.000000] [<c048c225>] ? printk+0x1d/0x20
[ 0.000000] [<c0106b15>] oops_end+0x95/0xd0
[ 0.000000] [<c0125dc6>] no_context+0xc6/0x160
[ 0.000000] [<c0125f10>] __bad_area_nosemaphore+0xb0/0x160
[ 0.000000] [<c01245e8>] ? _paravirt_ident_32+0x8/0x10
[ 0.000000] [<c0125fd7>] bad_area_nosemaphore+0x17/0x20
[ 0.000000] [<c0126269>] do_page_fault+0xb9/0x410
[ 0.000000] [<c01261b0>] ? do_page_fault+0x0/0x410
[ 0.000000] [<c0490864>] error_code+0x78/0x80
[ 0.000000] [<c017007b>] ? timer_list_show+0x7cb/0xf00
[ 0.000000] [<c011d136>] ? native_apic_mem_read+0x16/0x20
[ 0.000000] [<c011c352>] ? get_physical_broadcast+0x22/0x50
[ 0.000000] [<c0686380>] io_apic_get_unique_id+0x78/0x1d1
[ 0.000000] [<c0124548>] ? native_set_pte_at+0x8/0x10
[ 0.000000] [<c01247e8>] ? native_flush_tlb_single+0x8/0x10
[ 0.000000] [<c012a95c>] ? set_pte_vaddr+0x6c/0x80
[ 0.000000] [<c0686502>] io_apic_unique_id+0x29/0x2b
[ 0.000000] [<c06865ce>] mp_register_ioapic+0xca/0x11a
[ 0.000000] [<c0683bfb>] MP_ioapic_info+0x44/0x4a
[ 0.000000] [<c0684158>] default_get_smp_config+0x334/0x490
[ 0.000000] [<c068ff6c>] ? free_area_init_nodes+0x30c/0x314
[ 0.000000] [<c03e5599>] ? read_pci_config_16+0x9/0x40
[ 0.000000] [<c067cab7>] setup_arch+0xa04/0xa5e
[ 0.000000] [<c0144e15>] ? vprintk+0x375/0x480
[ 0.000000] [<c01758db>] ? trace_hardirqs_off+0xb/0x10
[ 0.000000] [<c048fe0d>] ? _raw_spin_unlock_irqrestore+0x5d/0x70
[ 0.000000] [<c0679714>] start_kernel+0xdc/0x378
[ 0.000000] [<c06790d7>] i386_start_kernel+0xd7/0xdf


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-04 09:19:55

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Wednesday 04 Aug 2010 10:16:08 Tvrtko Ursulin wrote:
> On Wednesday 04 Aug 2010 10:05:36 Yinghai Lu wrote:
> > On 08/04/2010 01:18 AM, Tvrtko Ursulin wrote:
> > > On Tuesday 03 Aug 2010 21:57:48 Yinghai Lu wrote:
> > >> On Tue, Aug 3, 2010 at 8:59 AM, Tvrtko Ursulin
> > >>
> > >> <[email protected]> wrote:
> > >>> On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
> > >>>> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> > >>>>> On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> > >>>>>> On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> > >>>>>>> I have basically built 2.6.35 with make oldconfig from a working
> > >>>>>>> 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I
> > >>>>>>> see nothing after grub (have early printk and verbose bootup
> > >>>>>>> enabled), just a blinking VGA cursor and CPU at 100%.
> > >>>>>>
> > >>>>>> Please copy [email protected] on kvm issues.
> > >>>>>>
> > >>>>>>> CONFIG_PRINTK_TIME=y
> > >>>>>>
> > >>>>>> Try disabling this as a workaround.
> > >>>>>
> > >>>>> I am in the middle of a bisect run with five builds left to go,
> > >>>>> currently I have:
> > >>>>>
> > >>>>> bad 537b60d17894b7c19a6060feae40299d7109d6e7
> > >>>>> good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
> > >>>>
> > >>>> Bisect is looking good, narrowed it to ten revisions, but I am not
> > >>>> sure to make it to the end today:
> > >>>>
> > >>>> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
> > >>>> good 41d59102e146a4423a490b8eca68a5860af4fe1c
> > >>>
> > >>> Bisect points the finger to "x86, ioapic: In mpparse use
> > >>> mp_register_ioapic" (cf7500c0ea133d66f8449d86392d83f840102632), so I
> > >>> am copying Eric. No idea whether this commit is solely to blame or
> > >>> it is a combined interaction with KVM, but I am sure you guys will
> > >>> know.
> > >>>
> > >>> If you want me to test something else please shout.
> > >>
> > >> please try attached patch, to see if it help.
> > >
> > > No luck (no visible difference, no output on VGA or serial console).
> > > (Btw there is a typo in pin_2_irq_leagcy so that you do not push it
> > > directly).
> >
> > can you try current tip with
> > earlyprintk=ttyS0,115200 or console=uart8250,io,0x3f8,115200?
>
> Not the tip but 2.6.35 with earlyprintk=ttyS0,115200:

Correction, crash log was from 2.6.35 plus your smp_mptable_2.patch.

Tvrtko




Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-04 09:35:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On 08/04/2010 02:16 AM, Tvrtko Ursulin wrote:
> On Wednesday 04 Aug 2010 10:05:36 Yinghai Lu wrote:
>> On 08/04/2010 01:18 AM, Tvrtko Ursulin wrote:
>>> On Tuesday 03 Aug 2010 21:57:48 Yinghai Lu wrote:
>>>> On Tue, Aug 3, 2010 at 8:59 AM, Tvrtko Ursulin
>>>>
>>>> <[email protected]> wrote:
>>>>> On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
>>>>>> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
>>>>>>> On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
>>>>>>>> On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
>>>>>>>>> I have basically built 2.6.35 with make oldconfig from a working
>>>>>>>>> 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I
>>>>>>>>> see nothing after grub (have early printk and verbose bootup
>>>>>>>>> enabled), just a blinking VGA cursor and CPU at 100%.
>>>>>>>>
>>>>>>>> Please copy [email protected] on kvm issues.
>>>>>>>>
>>>>>>>>> CONFIG_PRINTK_TIME=y
>>>>>>>>
>>>>>>>> Try disabling this as a workaround.
>>>>>>>
>>>>>>> I am in the middle of a bisect run with five builds left to go,
>>>>>>> currently I have:
>>>>>>>
>>>>>>> bad 537b60d17894b7c19a6060feae40299d7109d6e7
>>>>>>> good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
>>>>>>
>>>>>> Bisect is looking good, narrowed it to ten revisions, but I am not
>>>>>> sure to make it to the end today:
>>>>>>
>>>>>> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
>>>>>> good 41d59102e146a4423a490b8eca68a5860af4fe1c
>>>>>
>>>>> Bisect points the finger to "x86, ioapic: In mpparse use
>>>>> mp_register_ioapic" (cf7500c0ea133d66f8449d86392d83f840102632), so I am
>>>>> copying Eric. No idea whether this commit is solely to blame or it is a
>>>>> combined interaction with KVM, but I am sure you guys will know.
>>>>>
>>>>> If you want me to test something else please shout.
>>>>
>>>> please try attached patch, to see if it help.
>>>
>>> No luck (no visible difference, no output on VGA or serial console). (Btw
>>> there is a typo in pin_2_irq_leagcy so that you do not push it directly).
>>
>> can you try current tip with
>> earlyprintk=ttyS0,115200 or console=uart8250,io,0x3f8,115200?
>
> Not the tip but 2.6.35 with earlyprintk=ttyS0,115200:
>
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 2.6.35 (root@kvm-ktest-32) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #4
> SMP Wed Aug 4 09:15:10 BST 2010
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
> [ 0.000000] BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
> [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> [ 0.000000] BIOS-e820: 0000000000100000 - 000000002bbfd000 (usable)
> [ 0.000000] BIOS-e820: 000000002bbfd000 - 000000002bc00000 (reserved)
> [ 0.000000] BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
> [ 0.000000] bootconsole [earlyser0] enabled
> [ 0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
> [ 0.000000] DMI 2.4 present.
> [ 0.000000] last_pfn = 0x2bbfd max_arch_pfn = 0x100000
> [ 0.000000] PAT not supported by CPU.
> [ 0.000000] Scanning 1 areas for low memory corruption
> [ 0.000000] modified physical RAM map:
> [ 0.000000] modified: 0000000000000000 - 0000000000001000 (reserved)
> [ 0.000000] modified: 0000000000001000 - 0000000000002000 (usable)
> [ 0.000000] modified: 0000000000002000 - 0000000000010000 (reserved)
> [ 0.000000] modified: 0000000000010000 - 000000000009f400 (usable)
> [ 0.000000] modified: 000000000009f400 - 00000000000a0000 (reserved)
> [ 0.000000] modified: 00000000000f0000 - 0000000000100000 (reserved)
> [ 0.000000] modified: 0000000000100000 - 000000002bbfd000 (usable)
> [ 0.000000] modified: 000000002bbfd000 - 000000002bc00000 (reserved)
> [ 0.000000] modified: 00000000fffbc000 - 0000000100000000 (reserved)
> [ 0.000000] found SMP MP-table at [c00f85c0] f85c0
> [ 0.000000] init_memory_mapping: 0000000000000000-000000002bbfd000
> [ 0.000000] RAMDISK: 1fa29000 - 20d3e000
> [ 0.000000] 699MB LOWMEM available.
> [ 0.000000] mapped low ram: 0 - 2bbfd000
> [ 0.000000] low ram: 0 - 2bbfd000
> [ 0.000000] kvm-clock: Using msrs 12 and 11
> [ 0.000000] kvm-clock: cpu 0, msr 0:82a341, boot clock
> [ 0.000000] Zone PFN ranges:
> [ 0.000000] DMA 0x00000001 -> 0x00001000
> [ 0.000000] Normal 0x00001000 -> 0x0002bbfd
> [ 0.000000] Movable zone start PFN for each node
> [ 0.000000] early_node_map[3] active PFN ranges
> [ 0.000000] 0: 0x00000001 -> 0x00000002
> [ 0.000000] 0: 0x00000010 -> 0x0000009f
> [ 0.000000] 0: 0x00000100 -> 0x0002bbfd
> [ 0.000000] Using APIC driver default
> [ 0.000000] Intel MultiProcessor Specification v1.4
> [ 0.000000] Virtual Wire compatibility mode.
> [ 0.000000] MPTABLE: OEM ID: BOCHSCPU
> [ 0.000000] MPTABLE: Product ID: 0.1
> [ 0.000000] MPTABLE: APIC at: 0xFEE00000
> [ 0.000000] Processor #0 (Bootup-CPU)
> [ 0.000000] I/O APIC #1 Version 17 at 0xFEC00000.

so your host is 32bit or 64bit?

can you use working 32bit guest to dump mptable like "debug apic=debug acpi=off earlyprintk..." ?

Yinghai

2010-08-04 09:36:51

by Gleb Natapov

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Wed, Aug 04, 2010 at 10:16:08AM +0100, Tvrtko Ursulin wrote:
> On Wednesday 04 Aug 2010 10:05:36 Yinghai Lu wrote:
> > On 08/04/2010 01:18 AM, Tvrtko Ursulin wrote:
> > > On Tuesday 03 Aug 2010 21:57:48 Yinghai Lu wrote:
> > >> On Tue, Aug 3, 2010 at 8:59 AM, Tvrtko Ursulin
> > >>
> > >> <[email protected]> wrote:
> > >>> On Tuesday 03 Aug 2010 16:17:20 Tvrtko Ursulin wrote:
> > >>>> On Tuesday 03 Aug 2010 15:57:03 Tvrtko Ursulin wrote:
> > >>>>> On Tuesday 03 Aug 2010 15:51:08 Avi Kivity wrote:
> > >>>>>> On 08/03/2010 12:28 PM, Tvrtko Ursulin wrote:
> > >>>>>>> I have basically built 2.6.35 with make oldconfig from a working
> > >>>>>>> 2.6.34. Latter works fine in kvm while 2.6.35 hangs very early. I
> > >>>>>>> see nothing after grub (have early printk and verbose bootup
> > >>>>>>> enabled), just a blinking VGA cursor and CPU at 100%.
> > >>>>>>
> > >>>>>> Please copy [email protected] on kvm issues.
> > >>>>>>
> > >>>>>>> CONFIG_PRINTK_TIME=y
> > >>>>>>
> > >>>>>> Try disabling this as a workaround.
> > >>>>>
> > >>>>> I am in the middle of a bisect run with five builds left to go,
> > >>>>> currently I have:
> > >>>>>
> > >>>>> bad 537b60d17894b7c19a6060feae40299d7109d6e7
> > >>>>> good 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65
> > >>>>
> > >>>> Bisect is looking good, narrowed it to ten revisions, but I am not
> > >>>> sure to make it to the end today:
> > >>>>
> > >>>> bad cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea
> > >>>> good 41d59102e146a4423a490b8eca68a5860af4fe1c
> > >>>
> > >>> Bisect points the finger to "x86, ioapic: In mpparse use
> > >>> mp_register_ioapic" (cf7500c0ea133d66f8449d86392d83f840102632), so I am
> > >>> copying Eric. No idea whether this commit is solely to blame or it is a
> > >>> combined interaction with KVM, but I am sure you guys will know.
> > >>>
> > >>> If you want me to test something else please shout.
> > >>
> > >> please try attached patch, to see if it help.
> > >
> > > No luck (no visible difference, no output on VGA or serial console). (Btw
> > > there is a typo in pin_2_irq_leagcy so that you do not push it directly).
> >
> > can you try current tip with
> > earlyprintk=ttyS0,115200 or console=uart8250,io,0x3f8,115200?
>
> Not the tip but 2.6.35 with earlyprintk=ttyS0,115200:
>
> [ 0.000000] Processor #0 (Bootup-CPU)
> [ 0.000000] I/O APIC #1 Version 17 at 0xFEC00000.
> [ 0.000000] BUG: unable to handle kernel paging request at ffffb030
> [ 0.000000] IP: [<c011d136>] native_apic_mem_read+0x16/0x20
> [ 0.000000] *pde = 00832067 *pte = 00000000
Accessing APIC version register before APIC is mapped.

--
Gleb.

2010-08-04 09:44:39

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Wednesday 04 Aug 2010 10:34:28 Yinghai Lu wrote:

[snip]

> so your host is 32bit or 64bit?

Host is 64-bit.

> can you use working 32bit guest to dump mptable like "debug apic=debug
> acpi=off earlyprintk..." ?

This? From 2.6.34...

[ 0.000000] Using APIC driver default
[ 0.000000] Intel MultiProcessor Specification v1.4
[ 0.000000] Virtual Wire compatibility mode.
[ 0.000000] mpc: f85d0-f86b8
[ 0.000000] MPTABLE: OEM ID: BOCHSCPU
[ 0.000000] MPTABLE: Product ID: 0.1
[ 0.000000] MPTABLE: APIC at: 0xFEE00000
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] Bus #0 is PCI
[ 0.000000] Bus #1 is ISA
[ 0.000000] I/O APIC #1 Version 17 at 0xFEC00000.
[ 0.000000] Int: type 0, pol 1, trig 0, bus 00, IRQ 04, APIC ID 1, APIC INT 09
[ 0.000000] Int: type 0, pol 1, trig 0, bus 00, IRQ 0c, APIC ID 1, APIC INT 0b
[ 0.000000] Int: type 0, pol 1, trig 0, bus 00, IRQ 10, APIC ID 1, APIC INT 0b
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 00, APIC ID 1, APIC INT 02
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 01, APIC ID 1, APIC INT 01
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 03, APIC ID 1, APIC INT 03
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 04, APIC ID 1, APIC INT 04
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 05, APIC ID 1, APIC INT 05
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 06, APIC ID 1, APIC INT 06
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 07, APIC ID 1, APIC INT 07
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 08, APIC ID 1, APIC INT 08
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 0a, APIC ID 1, APIC INT 0a
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 0c, APIC ID 1, APIC INT 0c
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 0d, APIC ID 1, APIC INT 0d
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 0e, APIC ID 1, APIC INT 0e
[ 0.000000] Int: type 0, pol 0, trig 0, bus 01, IRQ 0f, APIC ID 1, APIC INT 0f
[ 0.000000] Lint: type 3, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 00
[ 0.000000] Lint: type 1, pol 0, trig 0, bus 01, IRQ 00, APIC ID 0, APIC LINT 01
[ 0.000000] Processors: 1
[ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] mapped APIC to ffffb000 (fee00000)
[ 0.000000] mapped IOAPIC to ffffa000 (fec00000)
[ 0.000000] nr_irqs_gsi: 24

...

[ 0.116253] Enabling APIC mode: Flat. Using 1 I/O APICs
[ 0.117328] Getting VERSION: 50014
[ 0.118109] Getting VERSION: 50014
[ 0.118888] Getting ID: 0
[ 0.119553] Getting ID: f000000
[ 0.120021] Getting LVT0: 8700
[ 0.120749] Getting LVT1: 8400
[ 0.122213] enabled ExtINT on CPU#0
[ 0.124650] ENABLING IO-APIC IRQs
[ 0.125451] Setting 1 in the phys_id_present_map
[ 0.126393] ...changing IO-APIC physical APIC ID to 1 ... ok.
[ 0.127575] init IO_APIC IRQs
[ 0.128011] 1-0 (apicid-pin) not connected
[ 0.128981] IOAPIC[0]: Set routing entry (1-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
[ 0.130542] IOAPIC[0]: Set routing entry (1-2 -> 0x30 -> IRQ 0 Mode:0 Active:0)
[ 0.132030] IOAPIC[0]: Set routing entry (1-3 -> 0x33 -> IRQ 3 Mode:0 Active:0)
[ 0.133591] IOAPIC[0]: Set routing entry (1-4 -> 0x34 -> IRQ 4 Mode:0 Active:0)
[ 0.135128] IOAPIC[0]: Set routing entry (1-5 -> 0x35 -> IRQ 5 Mode:0 Active:0)
[ 0.136042] IOAPIC[0]: Set routing entry (1-6 -> 0x36 -> IRQ 6 Mode:0 Active:0)
[ 0.137580] IOAPIC[0]: Set routing entry (1-7 -> 0x37 -> IRQ 7 Mode:0 Active:0)
[ 0.140026] IOAPIC[0]: Set routing entry (1-8 -> 0x38 -> IRQ 8 Mode:0 Active:0)
[ 0.141626] IOAPIC[0]: Set routing entry (1-9 -> 0x39 -> IRQ 9 Mode:1 Active:0)
[ 0.144026] IOAPIC[0]: Set routing entry (1-10 -> 0x3a -> IRQ 10 Mode:0 Active:0)
[ 0.145586] IOAPIC[0]: Set routing entry (1-11 -> 0x3b -> IRQ 11 Mode:1 Active:0)
[ 0.147153] IOAPIC[0]: Set routing entry (1-12 -> 0x3c -> IRQ 12 Mode:0 Active:0)
[ 0.148026] IOAPIC[0]: Set routing entry (1-13 -> 0x3d -> IRQ 13 Mode:0 Active:0)
[ 0.149605] IOAPIC[0]: Set routing entry (1-14 -> 0x3e -> IRQ 14 Mode:0 Active:0)
[ 0.152028] IOAPIC[0]: Set routing entry (1-15 -> 0x3f -> IRQ 15 Mode:0 Active:0)
[ 0.154255] 1-16 1-17 1-18 1-19 1-20 1-21 1-22 1-23 (apicid-pin) not connected
[ 0.156173] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.157281] CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03
[ 0.158645] Using local APIC timer interrupts.
[ 0.158646] calibrating APIC timer ...
[ 0.164000] ... lapic delta = 6250358
[ 0.164000] ..... delta 6250358
[ 0.164000] ..... mult: 268434054
[ 0.164000] ..... calibration result: 4000229
[ 0.164000] ..... CPU clock speed is 2926.0767 MHz.
[ 0.164000] ..... host bus clock speed is 1000.0229 MHz.
[ 0.164000] ... verify APIC timer
[ 0.274657] ... jiffies delta = 25
[ 0.275438] ... jiffies result ok
[ 0.276577] Brought up 1 CPUs
[ 0.277281] Total of 1 processors activated (5852.84 BogoMIPS).

...

[ 0.356006] printing PIC contents
[ 0.357286] ... PIC IMR: fffb
[ 0.358013] ... PIC IRR: 0001
[ 0.358736] ... PIC ISR: 0000
[ 0.359453] ... PIC ELCR: 0c00
[ 0.360023] printing local APIC contents on CPU#0/0:
[ 0.360998] ... APIC ID: 00000000 (0)
[ 0.361859] ... APIC VERSION: 00050014
[ 0.362734] ... APIC TASKPRI: 00000000 (00)
[ 0.363607] ... APIC PROCPRI: 00000000
[ 0.364000] ... APIC LDR: 01000000
[ 0.364000] ... APIC DFR: ffffffff
[ 0.364000] ... APIC SPIV: 000001ff
[ 0.364000] ... APIC ISR field:
[ 0.364000] 0000000000000000000000000000000000000000000000000000000000000000
[ 0.364000] ... APIC TMR field:
[ 0.364000] 0000000000000000000000000000000000000000000000000000000000000000
[ 0.364000] ... APIC IRR field:
[ 0.364000] 0000000000000000000000000000000000000000000000000000000000008000
[ 0.364000] ... APIC ESR: 00000000
[ 0.364000] ... APIC ICR: 000c4610
[ 0.364000] ... APIC ICR2: 00000000
[ 0.364000] ... APIC LVTT: 000200ef
[ 0.364000] ... APIC LVTPC: 00010000
[ 0.364000] ... APIC LVT0: 00010700
[ 0.364000] ... APIC LVT1: 00000400
[ 0.364000] ... APIC LVTERR: 000000fe
[ 0.364000] ... APIC TMICT: 0003d09e
[ 0.364000] ... APIC TMCCT: 0002c433
[ 0.364000] ... APIC TDCR: 00000003
[ 0.364000]
[ 0.368005] number of MP IRQ sources: 16.
[ 0.372004] number of IO-APIC #1 registers: 24.
[ 0.372920] testing the IO APIC.......................
[ 0.373935]
[ 0.374439] IO APIC #1......
[ 0.375131] .... register #00: 01000000
[ 0.376002] ....... : physical APIC id: 01
[ 0.376893] ....... : Delivery Type: 0
[ 0.377738] ....... : LTS : 0
[ 0.378626] .... register #01: 00170011
[ 0.379477] ....... : max redirection entries: 0017
[ 0.380003] ....... : PRQ implemented: 0
[ 0.380872] ....... : IO APIC version: 0011
[ 0.381783] .... register #02: 01000000
[ 0.382604] ....... : arbitration: 01
[ 0.384002] .... IRQ redirection table:
[ 0.384825] NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
[ 0.385880] 00 000 1 0 0 0 0 0 0 00
[ 0.386972] 01 001 0 0 0 0 0 1 1 31
[ 0.388785] 02 001 0 0 0 0 0 1 1 30
[ 0.389908] 03 001 0 0 0 0 0 1 1 33
[ 0.390997] 04 001 0 0 0 0 0 1 1 34
[ 0.392020] 05 001 0 0 0 0 0 1 1 35
[ 0.393111] 06 001 0 0 0 0 0 1 1 36
[ 0.394199] 07 001 0 0 0 0 0 1 1 37
[ 0.395288] 08 001 0 0 0 0 0 1 1 38
[ 0.396016] 09 001 1 1 0 0 0 1 1 39
[ 0.397169] 0a 001 0 0 0 0 0 1 1 3A
[ 0.398262] 0b 001 1 1 0 0 0 1 1 3B
[ 0.399348] 0c 001 0 0 0 0 0 1 1 3C
[ 0.400753] 0d 001 0 0 0 0 0 1 1 3D
[ 0.401844] 0e 001 0 0 0 0 0 1 1 3E
[ 0.402937] 0f 001 0 0 0 0 0 1 1 3F
[ 0.404016] 10 000 1 0 0 0 0 0 0 00
[ 0.405104] 11 000 1 0 0 0 0 0 0 00
[ 0.406229] 12 000 1 0 0 0 0 0 0 00
[ 0.407330] 13 000 1 0 0 0 0 0 0 00
[ 0.408759] 14 000 1 0 0 0 0 0 0 00
[ 0.410255] 15 000 1 0 0 0 0 0 0 00
[ 0.411355] 16 000 1 0 0 0 0 0 0 00
[ 0.412017] 17 000 1 0 0 0 0 0 0 00
[ 0.413101] IRQ to pin mappings:
[ 0.413843] IRQ0 -> 0:2
[ 0.414758] IRQ1 -> 0:1
[ 0.416003] IRQ3 -> 0:3
[ 0.416784] IRQ4 -> 0:4
[ 0.417571] IRQ5 -> 0:5
[ 0.419026] IRQ6 -> 0:6
[ 0.419827] IRQ7 -> 0:7
[ 0.420495] IRQ8 -> 0:8
[ 0.421270] IRQ9 -> 0:9
[ 0.422058] IRQ10 -> 0:10
[ 0.422871] IRQ11 -> 0:11
[ 0.423673] IRQ12 -> 0:12
[ 0.424580] IRQ13 -> 0:13
[ 0.425387] IRQ14 -> 0:14
[ 0.426182] IRQ15 -> 0:15
[ 0.427028] .................................... done.



Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-04 10:37:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

Gleb Natapov <[email protected]> writes:

> On Wed, Aug 04, 2010 at 10:16:08AM +0100, Tvrtko Ursulin wrote:
>>
>> Not the tip but 2.6.35 with earlyprintk=ttyS0,115200:
>>
>> [ 0.000000] Processor #0 (Bootup-CPU)
>> [ 0.000000] I/O APIC #1 Version 17 at 0xFEC00000.
>> [ 0.000000] BUG: unable to handle kernel paging request at ffffb030
>> [ 0.000000] IP: [<c011d136>] native_apic_mem_read+0x16/0x20
>> [ 0.000000] *pde = 00832067 *pte = 00000000
> Accessing APIC version register before APIC is mapped.

Yep. I see it now. We have some of the silliest code. We only
go down this path for certain revs of Intel cpus, and I double checked
this change on an AMD cpu which explains why I missed hitting this
case.

The call path that is new in they bisected commit is:
MP_ioapic_info()
mp_register_ioapic()
io_apic_unique_id()
io_apic_get_unique_id()
get_physical_broadcast()
modern_apic()
lapic_get_version()
apic_read(APIC_LVR)


Tvrtko can you test this patch and verify it fixes the kvm booting
issue?

This patch just maps the lapic early in the mmparse.c just like we
do in acpi/boot.c when parsing the acpi tables.

Eric


---

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a96489e..c07e513 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1606,7 +1606,7 @@ void __init init_apic_mappings(void)
* acpi lapic path already maps that address in
* acpi_register_lapic_address()
*/
- if (!acpi_lapic)
+ if (!acpi_lapic && !smp_found_config)
set_fixmap_nocache(FIX_APIC_BASE, apic_phys);

apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n",
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d86dbf7..d7b6f7f 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -274,6 +274,18 @@ static void __init smp_dump_mptable(struct mpc_table *mpc, unsigned char *mpt)

void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }

+static void __init smp_register_lapic_address(unsigned long address)
+{
+ mp_lapic_addr = address;
+
+ set_fixmap_nocache(FIX_APIC_BASE, address);
+ if (boot_cpu_physical_apicid == -1U) {
+ boot_cpu_physical_apicid = read_apic_id();
+ apic_version[boot_cpu_physical_apicid] =
+ GET_APIC_VERSION(apic_read(APIC_LVR));
+ }
+}
+
static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
{
char str[16];
@@ -295,6 +307,10 @@ static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
if (early)
return 1;

+ /* Initialize the lapic mapping */
+ if (!acpi_lapic)
+ smp_register_lapic_address(mpc->lapic);
+
if (mpc->oemptr)
x86_init.mpparse.smp_read_mpc_oem(mpc);

2010-08-04 10:47:06

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: 2.6.35 hangs on early boot in KVM

On Wednesday 04 Aug 2010 11:37:43 Eric W. Biederman wrote:
> Gleb Natapov <[email protected]> writes:
> > On Wed, Aug 04, 2010 at 10:16:08AM +0100, Tvrtko Ursulin wrote:
> >> Not the tip but 2.6.35 with earlyprintk=ttyS0,115200:
> >>
> >> [ 0.000000] Processor #0 (Bootup-CPU)
> >> [ 0.000000] I/O APIC #1 Version 17 at 0xFEC00000.
> >> [ 0.000000] BUG: unable to handle kernel paging request at ffffb030
> >> [ 0.000000] IP: [<c011d136>] native_apic_mem_read+0x16/0x20
> >> [ 0.000000] *pde = 00832067 *pte = 00000000
> >
> > Accessing APIC version register before APIC is mapped.
>
> Yep. I see it now. We have some of the silliest code. We only
> go down this path for certain revs of Intel cpus, and I double checked
> this change on an AMD cpu which explains why I missed hitting this
> case.
>
> The call path that is new in they bisected commit is:
> MP_ioapic_info()
> mp_register_ioapic()
> io_apic_unique_id()
> io_apic_get_unique_id()
> get_physical_broadcast()
> modern_apic()
> lapic_get_version()
> apic_read(APIC_LVR)
>
>
> Tvrtko can you test this patch and verify it fixes the kvm booting
> issue?
>
> This patch just maps the lapic early in the mmparse.c just like we
> do in acpi/boot.c when parsing the acpi tables.

Boots fine, thanks!

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

2010-08-04 20:30:45

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] x86/apic: Map the local apic when parsing the MP table.


This fixes a regression in 2.6.35 from 2.6.34, that is
present for select models of Intel cpus when people are
using an MP table.

The commit cf7500c0ea133d66f8449d86392d83f840102632
"x86, ioapic: In mpparse use mp_register_ioapic" started
calling mp_register_ioapic from MP_ioapic_info. An extremely
simple change that was obviously correct. Unfortunately
mp_register_ioapic did just a little more than the previous
hand crafted code and so we gained this call path.

The problem call path is:
MP_ioapic_info()
mp_register_ioapic()
io_apic_unique_id()
io_apic_get_unique_id()
get_physical_broadcast()
modern_apic()
lapic_get_version()
apic_read(APIC_LVR)

Which turned out to be a problem because the local apic
was not mapped, at that point, unlike the similar point
in the ACPI parsing code.

This problem is fixed by mapping the local apic when
parsing the mptable as soon as we reasonably can.

Looking at the number of places we setup the fixmap for
the local apic, I see some serious simplification opportunities.
For the moment except for not duplicating the setting up of the
fixmap in init_apic_mappings, I have not acted on them.

The regression from 2.6.34 is tracked in bug
https://bugzilla.kernel.org/show_bug.cgi?id=16173

Cc: [email protected]
Reported-by: David Hill <[email protected]>
Reported-by: Tvrtko Ursulin <[email protected]>
Tested-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/mpparse.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a96489e..c07e513 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1606,7 +1606,7 @@ void __init init_apic_mappings(void)
* acpi lapic path already maps that address in
* acpi_register_lapic_address()
*/
- if (!acpi_lapic)
+ if (!acpi_lapic && !smp_found_config)
set_fixmap_nocache(FIX_APIC_BASE, apic_phys);

apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n",
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d86dbf7..d7b6f7f 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -274,6 +274,18 @@ static void __init smp_dump_mptable(struct mpc_table *mpc, unsigned char *mpt)

void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }

+static void __init smp_register_lapic_address(unsigned long address)
+{
+ mp_lapic_addr = address;
+
+ set_fixmap_nocache(FIX_APIC_BASE, address);
+ if (boot_cpu_physical_apicid == -1U) {
+ boot_cpu_physical_apicid = read_apic_id();
+ apic_version[boot_cpu_physical_apicid] =
+ GET_APIC_VERSION(apic_read(APIC_LVR));
+ }
+}
+
static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
{
char str[16];
@@ -295,6 +307,10 @@ static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
if (early)
return 1;

+ /* Initialize the lapic mapping */
+ if (!acpi_lapic)
+ smp_register_lapic_address(mpc->lapic);
+
if (mpc->oemptr)
x86_init.mpparse.smp_read_mpc_oem(mpc);

--
1.6.5.2.143.g8cc62

2010-08-04 21:51:26

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: Map the local apic when parsing the MP table.

On 08/04/2010 01:30 PM, Eric W. Biederman wrote:
>
> This fixes a regression in 2.6.35 from 2.6.34, that is
> present for select models of Intel cpus when people are
> using an MP table.
>
> The commit cf7500c0ea133d66f8449d86392d83f840102632
> "x86, ioapic: In mpparse use mp_register_ioapic" started
> calling mp_register_ioapic from MP_ioapic_info. An extremely
> simple change that was obviously correct. Unfortunately
> mp_register_ioapic did just a little more than the previous
> hand crafted code and so we gained this call path.
>
> The problem call path is:
> MP_ioapic_info()
> mp_register_ioapic()
> io_apic_unique_id()
> io_apic_get_unique_id()
> get_physical_broadcast()
> modern_apic()
> lapic_get_version()
> apic_read(APIC_LVR)
>
> Which turned out to be a problem because the local apic
> was not mapped, at that point, unlike the similar point
> in the ACPI parsing code.
>
> This problem is fixed by mapping the local apic when
> parsing the mptable as soon as we reasonably can.
>
> Looking at the number of places we setup the fixmap for
> the local apic, I see some serious simplification opportunities.
> For the moment except for not duplicating the setting up of the
> fixmap in init_apic_mappings, I have not acted on them.
>
> The regression from 2.6.34 is tracked in bug
> https://bugzilla.kernel.org/show_bug.cgi?id=16173
>
> Cc: [email protected]
> Reported-by: David Hill <[email protected]>
> Reported-by: Tvrtko Ursulin <[email protected]>
> Tested-by: Tvrtko Ursulin <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> arch/x86/kernel/apic/apic.c | 2 +-
> arch/x86/kernel/mpparse.c | 16 ++++++++++++++++
> 2 files changed, 17 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index a96489e..c07e513 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1606,7 +1606,7 @@ void __init init_apic_mappings(void)
> * acpi lapic path already maps that address in
> * acpi_register_lapic_address()
> */
> - if (!acpi_lapic)
> + if (!acpi_lapic && !smp_found_config)
> set_fixmap_nocache(FIX_APIC_BASE, apic_phys);
>
> apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n",
> diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
> index d86dbf7..d7b6f7f 100644
> --- a/arch/x86/kernel/mpparse.c
> +++ b/arch/x86/kernel/mpparse.c
> @@ -274,6 +274,18 @@ static void __init smp_dump_mptable(struct mpc_table *mpc, unsigned char *mpt)
>
> void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }
>
> +static void __init smp_register_lapic_address(unsigned long address)
> +{
> + mp_lapic_addr = address;
> +
> + set_fixmap_nocache(FIX_APIC_BASE, address);
> + if (boot_cpu_physical_apicid == -1U) {
> + boot_cpu_physical_apicid = read_apic_id();
> + apic_version[boot_cpu_physical_apicid] =
> + GET_APIC_VERSION(apic_read(APIC_LVR));
> + }
> +}
> +
> static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
> {
> char str[16];
> @@ -295,6 +307,10 @@ static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
> if (early)
> return 1;
>
> + /* Initialize the lapic mapping */
> + if (!acpi_lapic)
> + smp_register_lapic_address(mpc->lapic);
> +
> if (mpc->oemptr)
> x86_init.mpparse.smp_read_mpc_oem(mpc);
>

Acked-by: Yinghai Lu <[email protected]>

will send out another two patches on top this one for cleanup.

Yinghai

2010-08-04 21:59:27

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 1/2] x86, acpi: merge two register_lapic_address()


They are the same, move it to apic.c

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/include/asm/apic.h | 1 +
arch/x86/kernel/acpi/boot.c | 16 ++--------------
arch/x86/kernel/apic/apic.c | 12 ++++++++++++
arch/x86/kernel/mpparse.c | 12 ------------
4 files changed, 15 insertions(+), 26 deletions(-)

Index: linux-2.6/arch/x86/include/asm/apic.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/apic.h
+++ linux-2.6/arch/x86/include/asm/apic.h
@@ -234,6 +234,7 @@ extern void init_bsp_APIC(void);
extern void setup_local_APIC(void);
extern void end_local_APIC_setup(void);
extern void init_apic_mappings(void);
+void smp_register_lapic_address(unsigned long address);
extern void setup_boot_APIC_clock(void);
extern void setup_secondary_APIC_clock(void);
extern int APIC_init_uniprocessor(void);
Index: linux-2.6/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6/arch/x86/kernel/acpi/boot.c
@@ -820,18 +820,6 @@ static int __init acpi_parse_fadt(struct
* returns 0 on success, < 0 on error
*/

-static void __init acpi_register_lapic_address(unsigned long address)
-{
- mp_lapic_addr = address;
-
- set_fixmap_nocache(FIX_APIC_BASE, address);
- if (boot_cpu_physical_apicid == -1U) {
- boot_cpu_physical_apicid = read_apic_id();
- apic_version[boot_cpu_physical_apicid] =
- GET_APIC_VERSION(apic_read(APIC_LVR));
- }
-}
-
static int __init early_acpi_parse_madt_lapic_addr_ovr(void)
{
int count;
@@ -853,7 +841,7 @@ static int __init early_acpi_parse_madt_
return count;
}

- acpi_register_lapic_address(acpi_lapic_addr);
+ smp_register_lapic_address(acpi_lapic_addr);

return count;
}
@@ -880,7 +868,7 @@ static int __init acpi_parse_madt_lapic_
return count;
}

- acpi_register_lapic_address(acpi_lapic_addr);
+ smp_register_lapic_address(acpi_lapic_addr);

count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_SAPIC,
acpi_parse_sapic, MAX_APICS);
Index: linux-2.6/arch/x86/kernel/apic/apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -1632,6 +1632,18 @@ void __init init_apic_mappings(void)
}
}

+void __init smp_register_lapic_address(unsigned long address)
+{
+ mp_lapic_addr = address;
+
+ set_fixmap_nocache(FIX_APIC_BASE, address);
+ if (boot_cpu_physical_apicid == -1U) {
+ boot_cpu_physical_apicid = read_apic_id();
+ apic_version[boot_cpu_physical_apicid] =
+ GET_APIC_VERSION(apic_read(APIC_LVR));
+ }
+}
+
/*
* This initializes the IO-APIC and APIC hardware if this is
* a UP kernel.
Index: linux-2.6/arch/x86/kernel/mpparse.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/mpparse.c
+++ linux-2.6/arch/x86/kernel/mpparse.c
@@ -274,18 +274,6 @@ static void __init smp_dump_mptable(stru

void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }

-static void __init smp_register_lapic_address(unsigned long address)
-{
- mp_lapic_addr = address;
-
- set_fixmap_nocache(FIX_APIC_BASE, address);
- if (boot_cpu_physical_apicid == -1U) {
- boot_cpu_physical_apicid = read_apic_id();
- apic_version[boot_cpu_physical_apicid] =
- GET_APIC_VERSION(apic_read(APIC_LVR));
- }
-}
-
static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
{
char str[16];

2010-08-04 22:01:22

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 2/2] x86: remove early_init_lapic_mapping


It is almost the same as smp_register_lapic_addr()

Just need to make smp_read_mpc() call smp_register_lapic_addr when early==1.

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/include/asm/apic.h | 1 -
arch/x86/kernel/apic/apic.c | 24 ++----------------------
arch/x86/kernel/mpparse.c | 8 ++------
arch/x86/mm/k8topology_64.c | 1 -
4 files changed, 4 insertions(+), 30 deletions(-)

Index: linux-2.6/arch/x86/mm/k8topology_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/k8topology_64.c
+++ linux-2.6/arch/x86/mm/k8topology_64.c
@@ -64,7 +64,6 @@ static __init void early_get_boot_cpu_id
if (smp_found_config)
early_get_smp_config();
#endif
- early_init_lapic_mapping();
}

int __init k8_get_nodes(struct bootnode *physnodes)
Index: linux-2.6/arch/x86/kernel/apic/apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -1560,28 +1560,6 @@ no_apic:
}
#endif

-#ifdef CONFIG_X86_64
-void __init early_init_lapic_mapping(void)
-{
- /*
- * If no local APIC can be found then go out
- * : it means there is no mpatable and MADT
- */
- if (!smp_found_config)
- return;
-
- set_fixmap_nocache(FIX_APIC_BASE, mp_lapic_addr);
- apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
- APIC_BASE, mp_lapic_addr);
-
- /*
- * Fetch the APIC ID of the BSP in case we have a
- * default configuration (or the MP table is broken).
- */
- boot_cpu_physical_apicid = read_apic_id();
-}
-#endif
-
/**
* init_apic_mappings - initialize APIC mappings
*/
@@ -1637,6 +1615,8 @@ void __init smp_register_lapic_address(u
mp_lapic_addr = address;

set_fixmap_nocache(FIX_APIC_BASE, address);
+ apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
+ APIC_BASE, mp_lapic_addr);
if (boot_cpu_physical_apicid == -1U) {
boot_cpu_physical_apicid = read_apic_id();
apic_version[boot_cpu_physical_apicid] =
Index: linux-2.6/arch/x86/kernel/mpparse.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/mpparse.c
+++ linux-2.6/arch/x86/kernel/mpparse.c
@@ -288,17 +288,13 @@ static int __init smp_read_mpc(struct mp
#ifdef CONFIG_X86_32
generic_mps_oem_check(mpc, oem, str);
#endif
- /* save the local APIC address, it might be non-default */
+ /* Initialize the lapic mapping */
if (!acpi_lapic)
- mp_lapic_addr = mpc->lapic;
+ smp_register_lapic_address(mpc->lapic);

if (early)
return 1;

- /* Initialize the lapic mapping */
- if (!acpi_lapic)
- smp_register_lapic_address(mpc->lapic);
-
if (mpc->oemptr)
x86_init.mpparse.smp_read_mpc_oem(mpc);

Index: linux-2.6/arch/x86/include/asm/apic.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/apic.h
+++ linux-2.6/arch/x86/include/asm/apic.h
@@ -244,7 +244,6 @@ extern void enable_NMI_through_LVT0(void
* On 32bit this is mach-xxx local
*/
#ifdef CONFIG_X86_64
-extern void early_init_lapic_mapping(void);
extern int apic_is_clustered_box(void);
#else
static inline int apic_is_clustered_box(void)

2010-08-06 00:17:39

by Eric W. Biederman

[permalink] [raw]
Subject: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

Commit-ID: 5989cd6a1cbf86587edcc856791f960978087311
Gitweb: http://git.kernel.org/tip/5989cd6a1cbf86587edcc856791f960978087311
Author: Eric W. Biederman <[email protected]>
AuthorDate: Wed, 4 Aug 2010 13:30:27 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Thu, 5 Aug 2010 16:26:42 -0700

x86, apic: Map the local apic when parsing the MP table.

This fixes a regression in 2.6.35 from 2.6.34, that is
present for select models of Intel cpus when people are
using an MP table.

The commit cf7500c0ea133d66f8449d86392d83f840102632
"x86, ioapic: In mpparse use mp_register_ioapic" started
calling mp_register_ioapic from MP_ioapic_info. An extremely
simple change that was obviously correct. Unfortunately
mp_register_ioapic did just a little more than the previous
hand crafted code and so we gained this call path.

The problem call path is:
MP_ioapic_info()
mp_register_ioapic()
io_apic_unique_id()
io_apic_get_unique_id()
get_physical_broadcast()
modern_apic()
lapic_get_version()
apic_read(APIC_LVR)

Which turned out to be a problem because the local apic
was not mapped, at that point, unlike the similar point
in the ACPI parsing code.

This problem is fixed by mapping the local apic when
parsing the mptable as soon as we reasonably can.

Looking at the number of places we setup the fixmap for
the local apic, I see some serious simplification opportunities.
For the moment except for not duplicating the setting up of the
fixmap in init_apic_mappings, I have not acted on them.

The regression from 2.6.34 is tracked in bug
https://bugzilla.kernel.org/show_bug.cgi?id=16173

Cc: <[email protected]> 2.6.35
Reported-by: David Hill <[email protected]>
Reported-by: Tvrtko Ursulin <[email protected]>
Tested-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/mpparse.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a96489e..c07e513 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1606,7 +1606,7 @@ void __init init_apic_mappings(void)
* acpi lapic path already maps that address in
* acpi_register_lapic_address()
*/
- if (!acpi_lapic)
+ if (!acpi_lapic && !smp_found_config)
set_fixmap_nocache(FIX_APIC_BASE, apic_phys);

apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n",
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d86dbf7..d7b6f7f 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -274,6 +274,18 @@ static void __init smp_dump_mptable(struct mpc_table *mpc, unsigned char *mpt)

void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }

+static void __init smp_register_lapic_address(unsigned long address)
+{
+ mp_lapic_addr = address;
+
+ set_fixmap_nocache(FIX_APIC_BASE, address);
+ if (boot_cpu_physical_apicid == -1U) {
+ boot_cpu_physical_apicid = read_apic_id();
+ apic_version[boot_cpu_physical_apicid] =
+ GET_APIC_VERSION(apic_read(APIC_LVR));
+ }
+}
+
static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
{
char str[16];
@@ -295,6 +307,10 @@ static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
if (early)
return 1;

+ /* Initialize the lapic mapping */
+ if (!acpi_lapic)
+ smp_register_lapic_address(mpc->lapic);
+
if (mpc->oemptr)
x86_init.mpparse.smp_read_mpc_oem(mpc);

2010-08-07 00:09:12

by Yinghai Lu

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

On 08/05/2010 05:15 PM, tip-bot for Eric W. Biederman wrote:
> Commit-ID: 5989cd6a1cbf86587edcc856791f960978087311
> Gitweb: http://git.kernel.org/tip/5989cd6a1cbf86587edcc856791f960978087311
> Author: Eric W. Biederman <[email protected]>
> AuthorDate: Wed, 4 Aug 2010 13:30:27 -0700
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Thu, 5 Aug 2010 16:26:42 -0700
>
> x86, apic: Map the local apic when parsing the MP table.
>
> This fixes a regression in 2.6.35 from 2.6.34, that is
> present for select models of Intel cpus when people are
> using an MP table.
>
> The commit cf7500c0ea133d66f8449d86392d83f840102632
> "x86, ioapic: In mpparse use mp_register_ioapic" started
> calling mp_register_ioapic from MP_ioapic_info. An extremely
> simple change that was obviously correct. Unfortunately
> mp_register_ioapic did just a little more than the previous
> hand crafted code and so we gained this call path.
>
> The problem call path is:
> MP_ioapic_info()
> mp_register_ioapic()
> io_apic_unique_id()
> io_apic_get_unique_id()
> get_physical_broadcast()
> modern_apic()
> lapic_get_version()
> apic_read(APIC_LVR)
>
> Which turned out to be a problem because the local apic
> was not mapped, at that point, unlike the similar point
> in the ACPI parsing code.
>
> This problem is fixed by mapping the local apic when
> parsing the mptable as soon as we reasonably can.
>
> Looking at the number of places we setup the fixmap for
> the local apic, I see some serious simplification opportunities.
> For the moment except for not duplicating the setting up of the
> fixmap in init_apic_mappings, I have not acted on them.
>
> The regression from 2.6.34 is tracked in bug
> https://bugzilla.kernel.org/show_bug.cgi?id=16173
>
> Cc: <[email protected]> 2.6.35
> Reported-by: David Hill <[email protected]>
> Reported-by: Tvrtko Ursulin <[email protected]>
> Tested-by: Tvrtko Ursulin <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: H. Peter Anvin <[email protected]>
> ---
> arch/x86/kernel/apic/apic.c | 2 +-
> arch/x86/kernel/mpparse.c | 16 ++++++++++++++++
> 2 files changed, 17 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index a96489e..c07e513 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1606,7 +1606,7 @@ void __init init_apic_mappings(void)
> * acpi lapic path already maps that address in
> * acpi_register_lapic_address()
> */
> - if (!acpi_lapic)
> + if (!acpi_lapic && !smp_found_config)
> set_fixmap_nocache(FIX_APIC_BASE, apic_phys);
>
> apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n",
this change is not needed, it will break:
1. found mptable, but is using default contruct path.
2. visws path, found found mptable, but get_smp_conf is not called.

YH

2010-08-07 00:16:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

On 08/06/2010 05:08 PM, Yinghai Lu wrote:
> this change is not needed, it will break:
> 1. found mptable, but is using default contruct path.
> 2. visws path, found found mptable, but get_smp_conf is not called.
>
> YH

I'm not sure the above is decipherable. Please provide an incremental
patch with a more detailed description.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-08-07 00:52:58

by Yinghai Lu

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

On 08/06/2010 05:15 PM, H. Peter Anvin wrote:
> On 08/06/2010 05:08 PM, Yinghai Lu wrote:
>> this change is not needed, it will break:
>> 1. found mptable, but is using default contruct path.
>> 2. visws path, found found mptable, but get_smp_conf is not called.
>>
>> YH
>
> I'm not sure the above is decipherable. Please provide an incremental
> patch with a more detailed description.
>
please check

[PATCH] x86: Fix lapic mapping with construct ISA and visws mptable path

do need to set lapic mapping for them

in arch/x86/kernel/visws_quirks.c:
we only have visws_find_smp_config() to set mp_lapic_addr to APIC_DEFAULT_PHYS_BASE
visws_get_smp_config() is nop call.
default_get_smp_config/check_physptr/smp_read_mpc is not called in the path.
So smp_register_lapic_address() is not called, and lapic is not mapped.


in arch/x86/kernel/mpparse.c
if mpf->feature1 != 0, it will go through contruct_default_ISA_mptable instead
of check_phystr path, so smp_register_lapic_address is not called.

those two path all have smp_found_config set.

So let remove !smp_found_config checking

Actually set fixmap two times does not hurt.

Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/kernel/apic/apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/x86/kernel/apic/apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -1606,7 +1606,7 @@ void __init init_apic_mappings(void)
* acpi lapic path already maps that address in
* acpi_register_lapic_address()
*/
- if (!acpi_lapic && !smp_found_config)
+ if (!acpi_lapic)
set_fixmap_nocache(FIX_APIC_BASE, apic_phys);

apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n",

2010-08-07 01:08:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

"H. Peter Anvin" <[email protected]> writes:

> On 08/06/2010 05:08 PM, Yinghai Lu wrote:
>> this change is not needed, it will break:
>> 1. found mptable, but is using default contruct path.
>> 2. visws path, found found mptable, but get_smp_conf is not called.
>>
>> YH
>
> I'm not sure the above is decipherable. Please provide an incremental
> patch with a more detailed description.

YH was saying I overoptimized, and it looks like he is right,
although there are only one or two machines in existence that
are likely to be affected.

Untested patch to remove the cleverness below. It it boots all
is well.

---

From: Eric W. Biederman <[email protected]>
Date: Fri, 6 Aug 2010 18:00:12 -0700
Subject: [PATCH] x86/apic: Always map the local apic in init_apic_mappings.

In all of the common cases we currently map the local apic before
we get to init_apic_mappings. Unfortunately there are still a few
weird subarch code paths that require us to map the local apic in
init_apic_mappings, and those subarchitectures set smp_found_config.

So just unconditionally run the fixmap code, and stop being clever.

Signed-off-by: Eric W. Biederman <[email protected]>
---
arch/x86/kernel/apic/apic.c | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index c07e513..ad96090 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1603,11 +1603,16 @@ void __init init_apic_mappings(void)
apic_phys = mp_lapic_addr;

/*
- * acpi lapic path already maps that address in
- * acpi_register_lapic_address()
+ * acpi and mptable paths already fixmap mp_lapic_addr
+ * at FIX_APIC_BASE but perform the fixmap anyway
+ * because our initialization code is spaghetti and
+ * there are weird subarchitectures that do something
+ * different. The double fixmap isn't particularly
+ * expensive and always running the code should prevent
+ * bitrot.
+ *
*/
- if (!acpi_lapic && !smp_found_config)
- set_fixmap_nocache(FIX_APIC_BASE, apic_phys);
+ set_fixmap_nocache(FIX_APIC_BASE, apic_phys);

apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n",
APIC_BASE, apic_phys);
--
1.6.5.2.143.g8cc62

2010-08-07 01:22:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

On 08/06/2010 06:08 PM, Eric W. Biederman wrote:
>>
>> I'm not sure the above is decipherable. Please provide an incremental
>> patch with a more detailed description.
>
> YH was saying I overoptimized, and it looks like he is right,
> although there are only one or two machines in existence that
> are likely to be affected.
>
> Untested patch to remove the cleverness below. It it boots all
> is well.
>

This makes sense to me. Yinghai, do you have a system that is actually
affected, and if so, could you test this patch?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-08-07 01:30:50

by Yinghai Lu

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

On 08/06/2010 06:21 PM, H. Peter Anvin wrote:
> On 08/06/2010 06:08 PM, Eric W. Biederman wrote:
>>>
>>> I'm not sure the above is decipherable. Please provide an incremental
>>> patch with a more detailed description.
>>
>> YH was saying I overoptimized, and it looks like he is right,
>> although there are only one or two machines in existence that
>> are likely to be affected.
>>
>> Untested patch to remove the cleverness below. It it boots all
>> is well.
>>
>
> This makes sense to me. Yinghai, do you have a system that is actually
> affected, and if so, could you test this patch?

no, i don't have those kind of system.

found it when i was preparing more smp_register_lapic_address patcheset.

I suggest we still keep !acpi_lapic checking, that should always right.

Yinghai

2010-08-07 02:49:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, apic: Map the local apic when parsing the MP table.

Yinghai Lu <[email protected]> writes:

> On 08/06/2010 06:21 PM, H. Peter Anvin wrote:
>> On 08/06/2010 06:08 PM, Eric W. Biederman wrote:
>>>>
>>>> I'm not sure the above is decipherable. Please provide an incremental
>>>> patch with a more detailed description.
>>>
>>> YH was saying I overoptimized, and it looks like he is right,
>>> although there are only one or two machines in existence that
>>> are likely to be affected.
>>>
>>> Untested patch to remove the cleverness below. It it boots all
>>> is well.
>>>
>>
>> This makes sense to me. Yinghai, do you have a system that is actually
>> affected, and if so, could you test this patch?
>
> no, i don't have those kind of system.

I don't know if anyone does. It looks like sfi aka moorestown
and visws are what are affected.

That is why I made a patch that any boot where we exercise a local
apic will exercise.

Arguably if it is best to just remove that hunk from my patch, so
we have something that is safe to backport to 2.6.35.1.

> found it when i was preparing more smp_register_lapic_address patcheset.
>
> I suggest we still keep !acpi_lapic checking, that should always right.

Ultimately we want to remove the code duplication entirely.

Eric