Some systems lock up without the noapic option. I found one
that will freeze while trying to set up the timer interrupt.
Passing 'nolapic' makes it freeze just after:
Setting up timer through ExtINT... works
Sometimes it will boot up and then freeze during the startup
scripts. Passing the noapic option fixes all that, but it
then gets 1000 spurious interrupts per second on IRQ7 (which
only shows ehci using it.) Kernel version is 2.6.22.
Chuck Ebbert <[email protected]> writes:
> Some systems lock up without the noapic option.
Please find patterns: cpu type, chipsets, mainboard vendors etc.
> I found one
> that will freeze while trying to set up the timer interrupt.
> Passing 'nolapic' makes it freeze just after:
>
> Setting up timer through ExtINT... works
Always boot with apic=debug
The messages means the primary timer setup methods already didn't work.
ExtInt is really a crappy fallback that was originally only
needed for some early SMP systems which where the timer was not wired
according to specs.
But the real problem is that the standard timer access method
through the local APIC didn't work.
I had a rewrite of the timer probing some time ago that tried
more combinations automatically. It had some problems so it
never went in, but perhaps it's worth revisiting.
-Andi
On 09/06/2007 07:31 AM, Andi Kleen wrote:
> Chuck Ebbert <[email protected]> writes:
>
>> Some systems lock up without the noapic option.
>
> Please find patterns: cpu type, chipsets, mainboard vendors etc.
>
This is the first one I've actually had in front of me:
HP TX1000 notebook
Nvidia C51/MCP51 mobile chipset
Booting with "noapic" gives some very strange results. This is two
snapshots of /proc/interrupts taken one second apart. It almost looks
like timer interrupts are occurring on IRQ 0 and IRQ7 on different
CPUs:
CPU0 CPU1
0: 446096 6224 XT-PIC-XT timer
1: 342 6 XT-PIC-XT i8042
2: 0 0 XT-PIC-XT cascade
5: 3099 865 XT-PIC-XT sata_nv
7: 8145 494718 XT-PIC-XT ehci_hcd:usb2
8: 0 0 XT-PIC-XT rtc0
9: 323 9 XT-PIC-XT acpi
10: 136 36 XT-PIC-XT HDA Intel
11: 43884 1091 XT-PIC-XT ohci_hcd:usb1, eth0
12: 104 19 XT-PIC-XT i8042
14: 1011 25 XT-PIC-XT libata
15: 0 0 XT-PIC-XT libata
NMI: 0 0
LOC: 6212 445951
ERR: 403241
MIS: 0
CPU0 CPU1
0: 447098 6233 XT-PIC-XT timer
1: 343 6 XT-PIC-XT i8042
2: 0 0 XT-PIC-XT cascade
5: 3100 865 XT-PIC-XT sata_nv
7: 8158 495847 XT-PIC-XT ehci_hcd:usb2
8: 0 0 XT-PIC-XT rtc0
9: 323 9 XT-PIC-XT acpi
10: 136 36 XT-PIC-XT HDA Intel
11: 43988 1094 XT-PIC-XT ohci_hcd:usb1, eth0
12: 104 19 XT-PIC-XT i8042
14: 1032 26 XT-PIC-XT libata
15: 0 0 XT-PIC-XT libata
NMI: 0 0
LOC: 6221 446953
ERR: 404383
MIS: 0
>> I found one
>> that will freeze while trying to set up the timer interrupt.
>> Passing 'nolapic' makes it freeze just after:
>>
>> Setting up timer through ExtINT... works
>
> Always boot with apic=debug
>
I can't capture the messages. Even when it boots it doesn't last
long enough to get them.
Chuck Ebbert wrote:
> On 09/06/2007 07:31 AM, Andi Kleen wrote:
> > Chuck Ebbert <[email protected]> writes:
> >> Some systems lock up without the noapic option.
> >
> > Please find patterns: cpu type, chipsets, mainboard vendors etc.
>
> This is the first one I've actually had in front of me:
>
> HP TX1000 notebook
> Nvidia C51/MCP51 mobile chipset
>
> Booting with "noapic" gives some very strange results. This is two
> snapshots of /proc/interrupts taken one second apart. It almost looks
> like timer interrupts are occurring on IRQ 0 and IRQ7 on different
> CPUs:
>
> CPU0 CPU1
> 0: 446096 6224 XT-PIC-XT timer
> 1: 342 6 XT-PIC-XT i8042
> 2: 0 0 XT-PIC-XT cascade
> 5: 3099 865 XT-PIC-XT sata_nv
> 7: 8145 494718 XT-PIC-XT ehci_hcd:usb2
> 8: 0 0 XT-PIC-XT rtc0
> 9: 323 9 XT-PIC-XT acpi
> 10: 136 36 XT-PIC-XT HDA Intel
> 11: 43884 1091 XT-PIC-XT ohci_hcd:usb1, eth0
> 12: 104 19 XT-PIC-XT i8042
> 14: 1011 25 XT-PIC-XT libata
> 15: 0 0 XT-PIC-XT libata
> NMI: 0 0
> LOC: 6212 445951
> ERR: 403241
> MIS: 0
You may want to try to reconfigure your bios to reserve irq 5/7 for isa only.
Then post /proc/interrupts again.
Thanks!
--
Al
On the day of Friday 07 September 2007 Chuck Ebbert hast written:
> On 09/06/2007 07:31 AM, Andi Kleen wrote:
> > Chuck Ebbert <[email protected]> writes:
> >> Some systems lock up without the noapic option.
> >
> > Please find patterns: cpu type, chipsets, mainboard vendors etc.
>
> This is the first one I've actually had in front of me:
>
> HP TX1000 notebook
> Nvidia C51/MCP51 mobile chipset
Do you have a hpet? If not, have you tried using acpi_use_timer_override with
apic?
bye,
--
(?= =?)
//\ Prakash Punnoor /\\
V_/ \_V
On 09/08/2007 01:17 AM, Prakash Punnoor wrote:
> On the day of Friday 07 September 2007 Chuck Ebbert hast written:
>> On 09/06/2007 07:31 AM, Andi Kleen wrote:
>>> Chuck Ebbert <[email protected]> writes:
>>>> Some systems lock up without the noapic option.
>>> Please find patterns: cpu type, chipsets, mainboard vendors etc.
>> This is the first one I've actually had in front of me:
>>
>> HP TX1000 notebook
>> Nvidia C51/MCP51 mobile chipset
>
> Do you have a hpet? If not, have you tried using acpi_use_timer_override with
> apic?
Yes, it has an hpet. And I tried every combination of options I could
think of.
But, even stranger, x86_64 works (only i386 fails.)
>
> Yes, it has an hpet. And I tried every combination of options I could
> think of.
>
> But, even stranger, x86_64 works (only i386 fails.)
x86-64 has quite different time code (at least until the dyntick patches
currently in mm)
Obvious thing would be to diff the boot messages and see if anything
jumps out (e.g. in interrupt routing).
Or check with mm and if x86-64 is broken there too then it's likely
the new time code.
-Andi
On 09/10/2007 03:44 PM, Andi Kleen wrote:
>> Yes, it has an hpet. And I tried every combination of options I could
>> think of.
>
>> But, even stranger, x86_64 works (only i386 fails.)
>
> x86-64 has quite different time code (at least until the dyntick patches
> currently in mm)
>
> Obvious thing would be to diff the boot messages and see if anything
> jumps out (e.g. in interrupt routing).
>
> Or check with mm and if x86-64 is broken there too then it's likely
> the new time code.
This is Fedora 8 and it already has the highres-timers code in x86_64.
But I was still comparing 2.6.22 on i386 to 2.6.23-rc5-git1 + highres-timers
on x86_64. 2.6.23-rc5 on i386 seems okay too, so whatever is happening it
only occurs on 2.6.22 here.
On 09/10/2007 03:44 PM, Andi Kleen wrote:
>> Yes, it has an hpet. And I tried every combination of options I could
>> think of.
>
>> But, even stranger, x86_64 works (only i386 fails.)
>
> x86-64 has quite different time code (at least until the dyntick patches
> currently in mm)
>
> Obvious thing would be to diff the boot messages and see if anything
> jumps out (e.g. in interrupt routing).
>
> Or check with mm and if x86-64 is broken there too then it's likely
> the new time code.
I reported too soon that x86_64 works. It does not work, it just takes
a bit longer before it freezes. There are message threads all over the
place discussing this problem with the HP Pavilion tx 1000, and it seems
the best workaround is to use the "nolapic" option instead of "noapic".
Using that, it is totally stable _and_ there are no spurious interrupts
that would otherwise break USB. Interrupt setup is a bit strange, though:
CPU0 CPU1
0: 241 0 XT-PIC-XT timer
1: 1 736 IO-APIC-edge i8042
2: 0 0 XT-PIC-XT cascade
5: 14 10028 IO-APIC-edge sata_nv
7: 0 57 IO-APIC-edge ehci_hcd:usb1
8: 0 0 IO-APIC-edge rtc0
9: 4 2463 IO-APIC-edge acpi
10: 2 2795 IO-APIC-edge HDA Intel
11: 740 478806 IO-APIC-edge ohci_hcd:usb2, eth0
12: 42 19911 IO-APIC-edge i8042
14: 5 7958 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
NMI: 0 0
LOC: 4617310 4617213
ERR: 0
On 06 Sep 2007 13:31:50 +0200 Andi Kleen <[email protected]> wrote:
> Chuck Ebbert <[email protected]> writes:
>
> > Some systems lock up without the noapic option.
>
> Please find patterns: cpu type, chipsets, mainboard vendors etc.
There are 48 bugs in bugzilla which mention "noapic"
http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=noapic&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=®ression=both&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0=
And there are 173,000 on the internet ;)
http://www.google.com/search?hl=en&q=linux+noapic&btnG=Google+Search
We screwed this pooch a long time ago - years. Perhaps if some of the many
noapic users could run a bisection search to work out when it broke we
could start fixing things. But they all have a workaround so there's no
motivation.
On Saturday 15 September 2007, Andrew Morton wrote:
> There are 48 bugs in bugzilla which mention "noapic"
>
> http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=noapic&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=®ression=both&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0=
>
> And there are 173,000 on the internet ;)
> http://www.google.com/search?hl=en&q=linux+noapic&btnG=Google+Search
>
> We screwed this pooch a long time ago - years. Perhaps if some of the many
> noapic users could run a bisection search to work out when it broke we
> could start fixing things. But they all have a workaround so there's no
> motivation.
I have 2 SMP-Boards and both need noapic. One is from 2001 (AUSUS CUR-DLS),
one is from June 2006 (Gigabyte M57SLI-S4).
There are many reasons:
1. Bugs which have such a simple workaround don't get much attention.
2. Usually SMP boards are used for machines, which just HAVE to work,
since they have been expensive. These are not consumer boards.
3. I usually had only USB problems (no IRQ), if ommiting noapic.
USB technology is a cosumer grade technology and enterprise
grade developers don't have much interest in it (until now?).
4. IRQ routing setup is often a BIOS issue. You might be able
to fix that by upgrading your BIOS. That often needs a Windows
tool. Linux people not always (want to) have access to Windows :-)
I reported the all the problems (starting 2001), no developer
seemed interested.
I can report them against the latest RC6 kernel tomorrow and put them
into bugzilla, if we now REALLY care.
Best Regards
Ingo Oeser
On Sat, 15 Sep 2007 12:58:27 +0200 Ingo Oeser <[email protected]> wrote:
> On Saturday 15 September 2007, Andrew Morton wrote:
> > There are 48 bugs in bugzilla which mention "noapic"
> >
> > http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=noapic&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=®ression=both&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0=
> >
> > And there are 173,000 on the internet ;)
> > http://www.google.com/search?hl=en&q=linux+noapic&btnG=Google+Search
> >
> > We screwed this pooch a long time ago - years. Perhaps if some of the many
> > noapic users could run a bisection search to work out when it broke we
> > could start fixing things. But they all have a workaround so there's no
> > motivation.
>
> I have 2 SMP-Boards and both need noapic. One is from 2001 (AUSUS CUR-DLS),
> one is from June 2006 (Gigabyte M57SLI-S4).
>
> There are many reasons:
>
> 1. Bugs which have such a simple workaround don't get much attention.
>
> 2. Usually SMP boards are used for machines, which just HAVE to work,
> since they have been expensive. These are not consumer boards.
>
> 3. I usually had only USB problems (no IRQ), if ommiting noapic.
> USB technology is a cosumer grade technology and enterprise
> grade developers don't have much interest in it (until now?).
>
> 4. IRQ routing setup is often a BIOS issue. You might be able
> to fix that by upgrading your BIOS. That often needs a Windows
> tool. Linux people not always (want to) have access to Windows :-)
>
> I reported the all the problems (starting 2001), no developer
> seemed interested.
>
> I can report them against the latest RC6 kernel tomorrow and put them
> into bugzilla, if we now REALLY care.
>
I believe that about two years ago we broke something which caused quite a
large number of people to need noapic. Is that the case with any of your
machines? Do you know if they run 2.6.ancient without noapic?
Thanks.
On Sat, Sep 15, 2007 at 04:08:02AM -0700, Andrew Morton wrote:
> I believe that about two years ago we broke something which caused quite a
> large number of people to need noapic. Is that the case with any of your
> machines? Do you know if they run 2.6.ancient without noapic?
My recollection is that we shifted from "Enable the apic even if the
BIOS disabled it" to "Only use the apic if the BIOS didn't disable it"
around that time, which meant that distributions could actually turn on
apic-on-up support without breaking everything. That might correspond to
what you're seeing.
--
Matthew Garrett | [email protected]
On Saturday, 15 September 2007 09:39, Andrew Morton wrote:
> On 06 Sep 2007 13:31:50 +0200 Andi Kleen <[email protected]> wrote:
>
> > Chuck Ebbert <[email protected]> writes:
> >
> > > Some systems lock up without the noapic option.
> >
> > Please find patterns: cpu type, chipsets, mainboard vendors etc.
>
> There are 48 bugs in bugzilla which mention "noapic"
>
> http://bugzilla.kernel.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&long_desc_type=substring&long_desc=noapic&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=REOPENED&bug_status=ASSIGNED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=®ression=both&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0=
>
> And there are 173,000 on the internet ;)
> http://www.google.com/search?hl=en&q=linux+noapic&btnG=Google+Search
>
> We screwed this pooch a long time ago - years. Perhaps if some of the many
> noapic users could run a bisection search to work out when it broke we
> could start fixing things. But they all have a workaround so there's no
> motivation.
Well, I think it broke soon after 2.6.9.
Please see http://bugzilla.kernel.org/show_bug.cgi?id=3639#c10
On Sat, Sep 15, 2007 at 01:08:25PM +0100, Matthew Garrett wrote:
> On Sat, Sep 15, 2007 at 04:08:02AM -0700, Andrew Morton wrote:
>
> > I believe that about two years ago we broke something which caused quite a
> > large number of people to need noapic. Is that the case with any of your
> > machines? Do you know if they run 2.6.ancient without noapic?
>
> My recollection is that we shifted from "Enable the apic even if the
> BIOS disabled it" to "Only use the apic if the BIOS didn't disable it"
> around that time, which meant that distributions could actually turn on
> apic-on-up support without breaking everything. That might correspond to
> what you're seeing.
If memory serves correctly, that was circa 2.6.10, back in these commits..
commit a068ea13d1db406e15c346e93530343f6e70184c
Author: Len Brown <[email protected]>
Date: Sun Oct 10 05:21:08 2004 -0400
[ACPI] If BIOS disabled the LAPIC, believe it by default.
"lapic" is available to force enabling the LAPIC
in the event you know more than your BIOS vendor.
http://bugzilla.kernel.org/show_bug.cgi?id=3238
commit 2fcfece90db9643b6f30a7ad343898a2871e6a81
Author: Len Brown <[email protected]>
Date: Sat Oct 9 20:12:45 2004 -0400
[ACPI] Don't enable LAPIC when the BIOS disabled it.
Doing so apparently breaks every Dell on Earth.
http://bugzilla.kernel.org/show_bug.cgi?id=3238
But those changes relate to the local APIC, which 'noapic' shouldn't
have any effect on should it ?
Dave
--
http://www.codemonkey.org.uk
Chuck,
On Thu, 2007-09-13 at 12:38 -0400, Chuck Ebbert wrote:
> On 09/10/2007 03:44 PM, Andi Kleen wrote:
> >> Yes, it has an hpet. And I tried every combination of options I could
> >> think of.
> >
> >> But, even stranger, x86_64 works (only i386 fails.)
> >
> > x86-64 has quite different time code (at least until the dyntick patches
> > currently in mm)
> >
> > Obvious thing would be to diff the boot messages and see if anything
> > jumps out (e.g. in interrupt routing).
> >
> > Or check with mm and if x86-64 is broken there too then it's likely
> > the new time code.
>
> I reported too soon that x86_64 works. It does not work, it just takes
> a bit longer before it freezes. There are message threads all over the
> place discussing this problem with the HP Pavilion tx 1000, and it seems
> the best workaround is to use the "nolapic" option instead of "noapic".
> Using that, it is totally stable _and_ there are no spurious interrupts
> that would otherwise break USB. Interrupt setup is a bit strange, though:
can you please send me 32 and 64 bit boot logs of mainline and fedora
kernels ?
tglx
Dave Jones wrote:
> If memory serves correctly, that was circa 2.6.10, back in these commits..
>
> commit a068ea13d1db406e15c346e93530343f6e70184c
> Author: Len Brown <[email protected]>
> Date: Sun Oct 10 05:21:08 2004 -0400
>
> [ACPI] If BIOS disabled the LAPIC, believe it by default.
> "lapic" is available to force enabling the LAPIC
> in the event you know more than your BIOS vendor.
> http://bugzilla.kernel.org/show_bug.cgi?id=3238
>
> commit 2fcfece90db9643b6f30a7ad343898a2871e6a81
> Author: Len Brown <[email protected]>
> Date: Sat Oct 9 20:12:45 2004 -0400
>
> [ACPI] Don't enable LAPIC when the BIOS disabled it.
> Doing so apparently breaks every Dell on Earth.
> http://bugzilla.kernel.org/show_bug.cgi?id=3238
>
>
> But those changes relate to the local APIC, which 'noapic' shouldn't
> have any effect on should it ?
If the LAPIC is disabled, then you CAN'T use the IO-APIC right? So then
wouldn't the noapic option have no effects since the apic is already
disabled?