2006-12-17 00:04:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work



On Sun, 17 Dec 2006, Tobias Diedrich wrote:
>
> No such luck, it still panics and the APIC error is also unchanged.

Ok. I don't see anything wrong off-hand, but I'll keep the patch in the
tree in the hopes that Andi and/or Eric can see what's wrong and solve it.

If we don't find a solution, I'll have to revert it, but let's give it a
few more days.

Tobias, can you please make sure to remind me about this if nothing seems
to happen?

Thanks,

Linus


2006-12-17 05:16:58

by Eric W. Biederman

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

Linus Torvalds <[email protected]> writes:

> On Sun, 17 Dec 2006, Tobias Diedrich wrote:
>>
>> No such luck, it still panics and the APIC error is also unchanged.
>
> Ok. I don't see anything wrong off-hand, but I'll keep the patch in the
> tree in the hopes that Andi and/or Eric can see what's wrong and solve it.
>
> If we don't find a solution, I'll have to revert it, but let's give it a
> few more days.
>
> Tobias, can you please make sure to remind me about this if nothing seems
> to happen?

Just skimming for differences the first test seems to be missing an
umask_IO_APIC_irq(0);

It would be good to know which case is working before this change was made.

Eric

2006-12-17 05:22:47

by Eric W. Biederman

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

Linus Torvalds <[email protected]> writes:

> On Sun, 17 Dec 2006, Tobias Diedrich wrote:
>>
>> No such luck, it still panics and the APIC error is also unchanged.
>
> Ok. I don't see anything wrong off-hand, but I'll keep the patch in the
> tree in the hopes that Andi and/or Eric can see what's wrong and solve it.
>
> If we don't find a solution, I'll have to revert it, but let's give it a
> few more days.
>
> Tobias, can you please make sure to remind me about this if nothing seems
> to happen?

Actually can anyone tell me how try_apic_pin is supposed to work at
all?

It doesn't appear to be programming the io_apic.

So either I am missing something or I have found a real problem.

Eric

2006-12-17 13:10:35

by Tobias Diedrich

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

Linus Torvalds wrote:
> On Sun, 17 Dec 2006, Tobias Diedrich wrote:
> >
> > No such luck, it still panics and the APIC error is also unchanged.
>
> Ok. I don't see anything wrong off-hand, but I'll keep the patch in the
> tree in the hopes that Andi and/or Eric can see what's wrong and solve it.
>
> If we don't find a solution, I'll have to revert it, but let's give it a
> few more days.
>
> Tobias, can you please make sure to remind me about this if nothing seems
> to happen?

Sure.

BTW, I'm also wondering if this secondary Oops is supposed to happen:
http://www.tdiedrich.de/~ranma/2.6.20-rc1-oops2.jpg
I guess the NMI watchdog is never disabled after the test failed?

|[68.908000] Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the 'noapic' kernel parameter
|[68.908002]
[~4 seconds later]
|[68.908300] NMI Watchdog detected LOCKUP on CPU 0
^^^^^^^^^ wrong timestamp?
|[73.637325] CPU 0
|[73.637451] Modules linked in:
|[73.637579] Pid: 1, comm: swapper Not tainted 2.6.20-rc1-amd64 #27
[...]

2006-12-17 17:27:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work



On Sun, 17 Dec 2006, Tobias Diedrich wrote:
>
> BTW, I'm also wondering if this secondary Oops is supposed to happen:

Well, if the timer doesn't work, then the NMI watchdog will trigger. So
it's "supposed" to happen in the sense that yeah, it's kind of expected,
but it's really bsically just a secondary issue. If the timer worked
properly, you'd never see it.

So it's just fallout from the original problem you have, and not
interesting in itself.

Linus

2006-12-18 06:17:56

by Len Brown

[permalink] [raw]
Subject: Re: IO-APIC + timer doesn't work

On Sunday 17 December 2006 00:22, Eric W. Biederman wrote:

> Actually can anyone tell me how try_apic_pin is supposed to work at
> all?
>
> It doesn't appear to be programming the io_apic.

magic:-)

ACPI can't even _describe_ the scenarios being tried by check_timer(),
which is trying to navigate the minefield of all possible undocumented
chipset dependent bugs. (ie, tinkering with the PIT when in IOAPIC mode...)

The chipset vendors can create new bugs in this area
faster than we can fix them, and there is a reason for this.

The public info on Windows says that they use 100HZ IRQ0 8254 only for UP.
On SMP, they use 64 HZ RTC on IRQ8 instead.

This means that for the population of system vendors that validate only with Windows,
only those timers are getting validated, and Linux on IRQ0 is exposed to HW bugs.

So the RTC looks like a safe path for a validated periodic ticker
when our first choice doesn't work. But the RTC isn't without problems.
It can tick only in powers of 2 HZ, and 100/250/300/1000 are not powers of 2.
Dunno if close counts -- 256 is close to our 250, and 1024 is close to our 1000,
but we don't have any choices close to 100 or 300 HZ.

-Len

ps.
Moving to the RTC from the PIT would move us 3 years forward
in hardware technology, from 1981 to 1984:-)