2002-11-13 22:22:26

by Mikael Pettersson

[permalink] [raw]
Subject: local APIC may cause XFree86 hang

Yesterday I installed a Radeon 8500 in a box. Suddenly the box
consistently hung hard every time I tried to shut down XFree86.

It turned out to be the local APIC timer: with it enabled,
the hangs occur; with it disabled but with the rest of the
local APIC and the performance counters in use, there are no
problems at all.

Does XFree86 (its core or particular drivers) use vm86() to
invoke, possibly graphics card specific, BIOS code?
That would explain the hangs I got. The fix would be to
disable the local APIC around vm86()'s BIOS calls, just like
we now disable it before APM suspend.

Doesn't the PCI code also do BIOS calls?

/Mikael


2002-11-13 22:47:45

by Alan

[permalink] [raw]
Subject: Re: local APIC may cause XFree86 hang

On Wed, 2002-11-13 at 22:29, Mikael Pettersson wrote:
> Does XFree86 (its core or particular drivers) use vm86() to
> invoke, possibly graphics card specific, BIOS code?
> That would explain the hangs I got. The fix would be to
> disable the local APIC around vm86()'s BIOS calls, just like
> we now disable it before APM suspend.

It does yes

2002-11-13 22:54:34

by Mikael Pettersson

[permalink] [raw]
Subject: Re: local APIC may cause XFree86 hang

Alan Cox writes:
> On Wed, 2002-11-13 at 22:29, Mikael Pettersson wrote:
> > Does XFree86 (its core or particular drivers) use vm86() to
> > invoke, possibly graphics card specific, BIOS code?
> > That would explain the hangs I got. The fix would be to
> > disable the local APIC around vm86()'s BIOS calls, just like
> > we now disable it before APM suspend.
>
> It does yes

Ok. I'll start working on a patch to vm86() tomorrow.

/Mikael

2002-11-13 23:46:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: local APIC may cause XFree86 hang

In article <[email protected]>,
Mikael Pettersson <[email protected]> wrote:
>
>Does XFree86 (its core or particular drivers) use vm86() to
>invoke, possibly graphics card specific, BIOS code?
>That would explain the hangs I got. The fix would be to
>disable the local APIC around vm86()'s BIOS calls, just like
>we now disable it before APM suspend.

It does.

HOWEVER, vm86() mode is very very different from APM, which uses real
mode. External interrupts in vm86 mode will not be taken inside vm86
mode - and disabling the local timer (by disabling the APIC) around a
vm86 mode is definitely _not_ a good idea, since it would be an instant
denial-of-service attack on SMP machines (the PIT timer only goes to
CPU0, so we depend on the local timer to do process timeouts etc on
other CPUs). The vm86 code might just be looping forever.

In other words, if it is really vm86-related, then
(a) it's a CPU bug
(b) we're screwed

I bet it's something else. Possibly just timing-specific (the APIC
makes interrupts much faster), but also possibly something to do with
the VGA interrupt (some XFree86 drivers actually use the gfx interrupts
these days)

Linus

2002-11-14 00:23:12

by Nakajima, Jun

[permalink] [raw]
Subject: RE: local APIC may cause XFree86 hang



Are we disabling vm86 code to access to PIT or PIC? I saw some video ROM
code (either BIOS call or far call) did access PIT, confusing the OS.

Thanks,
Jun

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Wednesday, November 13, 2002 3:53 PM
> To: [email protected]
> Subject: Re: local APIC may cause XFree86 hang
>
> In article <[email protected]>,
> Mikael Pettersson <[email protected]> wrote:
> >
> >Does XFree86 (its core or particular drivers) use vm86() to
> >invoke, possibly graphics card specific, BIOS code?
> >That would explain the hangs I got. The fix would be to
> >disable the local APIC around vm86()'s BIOS calls, just like
> >we now disable it before APM suspend.
>
> It does.
>
> HOWEVER, vm86() mode is very very different from APM, which uses real
> mode. External interrupts in vm86 mode will not be taken inside vm86
> mode - and disabling the local timer (by disabling the APIC) around a
> vm86 mode is definitely _not_ a good idea, since it would be an instant
> denial-of-service attack on SMP machines (the PIT timer only goes to
> CPU0, so we depend on the local timer to do process timeouts etc on
> other CPUs). The vm86 code might just be looping forever.
>
> In other words, if it is really vm86-related, then
> (a) it's a CPU bug
> (b) we're screwed
>
> I bet it's something else. Possibly just timing-specific (the APIC
> makes interrupts much faster), but also possibly something to do with
> the VGA interrupt (some XFree86 drivers actually use the gfx interrupts
> these days)
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-11-14 00:41:19

by Linus Torvalds

[permalink] [raw]
Subject: RE: local APIC may cause XFree86 hang


On Wed, 13 Nov 2002, Nakajima, Jun wrote:
>
> Are we disabling vm86 code to access to PIT or PIC? I saw some video ROM
> code (either BIOS call or far call) did access PIT, confusing the OS.

Well, the kernel itself doesn't actually disable/enable anything, it
leaves that decision to the caller.

XFree86 obviously does have IO rights, and I suspect it may allow the
video BIOS to do just about anything, simply because it doesn't have much
choise (the video bios clearly needs a lot of IO privileges too). So yes,
that could easily confuse the OS if it happens, but it should be
independent of IO-APIC vs not.

Linus

2002-11-14 01:09:40

by Nakajima, Jun

[permalink] [raw]
Subject: RE: local APIC may cause XFree86 hang


The one instance I saw was that the BIOS was reading 8254 in a tight loop
for a calibration purpose, and it was assuming the time proceeded in a
constant speed, to exit the loop. In other words, it never assumed it could
get interrupts. To vm86, interrupts are invisible, but they have impacts on
the actual speed.

Jun

> -----Original Message-----
> From: Linus Torvalds [mailto:[email protected]]
> Sent: Wednesday, November 13, 2002 4:48 PM
> To: Nakajima, Jun
> Cc: [email protected]
> Subject: RE: local APIC may cause XFree86 hang
>
>
> On Wed, 13 Nov 2002, Nakajima, Jun wrote:
> >
> > Are we disabling vm86 code to access to PIT or PIC? I saw some video ROM
> > code (either BIOS call or far call) did access PIT, confusing the OS.
>
> Well, the kernel itself doesn't actually disable/enable anything, it
> leaves that decision to the caller.
>
> XFree86 obviously does have IO rights, and I suspect it may allow the
> video BIOS to do just about anything, simply because it doesn't have much
> choise (the video bios clearly needs a lot of IO privileges too). So yes,
> that could easily confuse the OS if it happens, but it should be
> independent of IO-APIC vs not.
>
> Linus

2002-11-14 01:31:41

by Linus Torvalds

[permalink] [raw]
Subject: RE: local APIC may cause XFree86 hang


On Wed, 13 Nov 2002, Nakajima, Jun wrote:
>
> The one instance I saw was that the BIOS was reading 8254 in a tight loop
> for a calibration purpose, and it was assuming the time proceeded in a
> constant speed, to exit the loop. In other words, it never assumed it could
> get interrupts. To vm86, interrupts are invisible, but they have impacts on
> the actual speed.

That sound slike a perfectly ok thing to do - apart from the hw latching
which might confuse the kernel.

When enabling the local APIC, Linux doesn't actually disable legacy PIT
interrupts, so again I don't really see what the apparent connection
between the hang and the APIC is. So I'd still suspect it's more
timing-related than anything else.

Linus

2002-11-14 07:52:02

by Gregoire Favre

[permalink] [raw]
Subject: Re: local APIC may cause XFree86 hang

Hello,

I don't know if it's related or not, but under .47 if I enable fb,
switching from X to the console hangs my system, with just VGA console I
don't have this problem (and anyway, with my Mach 64 card since a lots
of 2.5 the console is not usable: the white is replaced by an ilisible
blue???).

Have a great day,

Gr?goire
________________________________________________________________
http://ulima.unil.ch/greg ICQ:16624071 mailto:[email protected]

2002-11-14 08:02:24

by George Anzinger

[permalink] [raw]
Subject: Re: local APIC may cause XFree86 hang

Linus Torvalds wrote:
>
> On Wed, 13 Nov 2002, Nakajima, Jun wrote:
> >
> > The one instance I saw was that the BIOS was reading 8254 in a tight loop
> > for a calibration purpose, and it was assuming the time proceeded in a
> > constant speed, to exit the loop. In other words, it never assumed it could
> > get interrupts. To vm86, interrupts are invisible, but they have impacts on
> > the actual speed.
>
> That sound slike a perfectly ok thing to do - apart from the hw latching
> which might confuse the kernel.

Yes, it has been speculated that some "time warps" were
caused by "someone" reading only one of the two bytes from
the PIT. It puts the following reads out of sync. If this
was caused by an interrupt (which, of course, is where the
PIT is read by the kernel) between two reads, it could well
cause the "time warps" that have been observed.

George
>
> When enabling the local APIC, Linux doesn't actually disable legacy PIT
> interrupts, so again I don't really see what the apparent connection
> between the hang and the APIC is. So I'd still suspect it's more
> timing-related than anything else.
>
> Linus
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-11-14 13:54:42

by Alan

[permalink] [raw]
Subject: Re: local APIC may cause XFree86 hang

On Thu, 2002-11-14 at 08:08, george anzinger wrote:
> Linus Torvalds wrote:
> >
> > On Wed, 13 Nov 2002, Nakajima, Jun wrote:
> > >
> > > The one instance I saw was that the BIOS was reading 8254 in a tight loop
> > > for a calibration purpose, and it was assuming the time proceeded in a
> > > constant speed, to exit the loop. In other words, it never assumed it could
> > > get interrupts. To vm86, interrupts are invisible, but they have impacts on
> > > the actual speed.
> >
> > That sound slike a perfectly ok thing to do - apart from the hw latching
> > which might confuse the kernel.
>
> Yes, it has been speculated that some "time warps" were
> caused by "someone" reading only one of the two bytes from
> the PIT. It puts the following reads out of sync. If this
> was caused by an interrupt (which, of course, is where the
> PIT is read by the kernel) between two reads, it could well
> cause the "time warps" that have been observed.

The 2.5 kernel also has another bug in the way it handles the latches
reading the exact count value they wrap on. Some chips expose that value
momentarily before resetting