Linus & Marcelo,
Here is a patch which fixes a long-standing bug in the x86 local APIC
code. The patch applies to both 2.4.17 and 2.5.1. Please apply.
The Intel P6 local APIC internally signals an Illegal Vector error
whenever a zero vector is written to an LVT entry, even if the entry
is simultaneously masked. The bug is that apic.c triggers these errors
when the contents of LVTERR is defined by the BIOS and not the kernel,
which can cause unexpected interrupts on unknown vectors. This typically
happens at boot-time initialisation, PM suspend, and PM resume.
The patch eliminates the problem by changing the initialisation order
in apic.c's clear_local_APIC() and apic_pm_resume() to ensure that LVTERR
is masked when we write (potentially) null vectors to LVT entries.
(Non-broken UP BIOSen often boot the kernel with LVTERR null and masked,
and leave LVTERR alone at PM suspend/resume, so _usually_ the errors just
show up as annoying kernel messages. It is, however, an extremely bad
idea to rely on the BIOS to mask errors caused the kernel itself.)
/Mikael
--- linux-2.4.17-apicfixes/arch/i386/kernel/apic.c.~1~ Fri Nov 23 22:40:14 2001
+++ linux-2.4.17-apicfixes/arch/i386/kernel/apic.c Sun Dec 23 15:09:06 2001
@@ -56,6 +56,14 @@
maxlvt = get_maxlvt();
/*
+ * Masking an LVT entry on a P6 can trigger a local APIC error
+ * if the vector is zero. Mask LVTERR first to prevent this.
+ */
+ if (maxlvt >= 3) {
+ v = ERROR_APIC_VECTOR; /* any non-zero vector will do */
+ apic_write_around(APIC_LVTERR, v | APIC_LVT_MASKED);
+ }
+ /*
* Careful: we have to set masks only first to deassert
* any level-triggered sources.
*/
@@ -65,10 +73,6 @@
apic_write_around(APIC_LVT0, v | APIC_LVT_MASKED);
v = apic_read(APIC_LVT1);
apic_write_around(APIC_LVT1, v | APIC_LVT_MASKED);
- if (maxlvt >= 3) {
- v = apic_read(APIC_LVTERR);
- apic_write_around(APIC_LVTERR, v | APIC_LVT_MASKED);
- }
if (maxlvt >= 4) {
v = apic_read(APIC_LVTPC);
apic_write_around(APIC_LVTPC, v | APIC_LVT_MASKED);
@@ -84,6 +88,8 @@
apic_write_around(APIC_LVTERR, APIC_LVT_MASKED);
if (maxlvt >= 4)
apic_write_around(APIC_LVTPC, APIC_LVT_MASKED);
+ apic_write(APIC_ESR, 0);
+ v = apic_read(APIC_ESR);
}
void __init connect_bsp_APIC(void)
@@ -480,6 +486,7 @@
l &= ~MSR_IA32_APICBASE_BASE;
l |= MSR_IA32_APICBASE_ENABLE | APIC_DEFAULT_PHYS_BASE;
wrmsr(MSR_IA32_APICBASE, l, h);
+ apic_write(APIC_LVTERR, ERROR_APIC_VECTOR | APIC_LVT_MASKED);
apic_write(APIC_ID, apic_pm_state.apic_id);
apic_write(APIC_DFR, apic_pm_state.apic_dfr);
apic_write(APIC_LDR, apic_pm_state.apic_ldr);
@@ -487,15 +494,15 @@
apic_write(APIC_SPIV, apic_pm_state.apic_spiv);
apic_write(APIC_LVT0, apic_pm_state.apic_lvt0);
apic_write(APIC_LVT1, apic_pm_state.apic_lvt1);
+ apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc);
+ apic_write(APIC_LVTT, apic_pm_state.apic_lvtt);
+ apic_write(APIC_TDCR, apic_pm_state.apic_tdcr);
+ apic_write(APIC_TMICT, apic_pm_state.apic_tmict);
apic_write(APIC_ESR, 0);
apic_read(APIC_ESR);
apic_write(APIC_LVTERR, apic_pm_state.apic_lvterr);
apic_write(APIC_ESR, 0);
apic_read(APIC_ESR);
- apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc);
- apic_write(APIC_LVTT, apic_pm_state.apic_lvtt);
- apic_write(APIC_TDCR, apic_pm_state.apic_tdcr);
- apic_write(APIC_TMICT, apic_pm_state.apic_tmict);
__restore_flags(flags);
if (apic_pm_state.perfctr_pmdev)
pm_send(apic_pm_state.perfctr_pmdev, PM_RESUME, data);
On Sun, 23 Dec 2001, Mikael Pettersson wrote:
> The Intel P6 local APIC internally signals an Illegal Vector error
> whenever a zero vector is written to an LVT entry, even if the entry
> is simultaneously masked. The bug is that apic.c triggers these errors
> when the contents of LVTERR is defined by the BIOS and not the kernel,
> which can cause unexpected interrupts on unknown vectors. This typically
> happens at boot-time initialisation, PM suspend, and PM resume.
>
> The patch eliminates the problem by changing the initialisation order
> in apic.c's clear_local_APIC() and apic_pm_resume() to ensure that LVTERR
> is masked when we write (potentially) null vectors to LVT entries.
[...]
> @@ -84,6 +88,8 @@
> apic_write_around(APIC_LVTERR, APIC_LVT_MASKED);
> if (maxlvt >= 4)
> apic_write_around(APIC_LVTPC, APIC_LVT_MASKED);
> + apic_write(APIC_ESR, 0);
> + v = apic_read(APIC_ESR);
> }
You mustn't access the ESR unconditionally -- it doesn't exist in the
i82489DX. Also check the Pentium erratum 3AP for possible issues (I can't
recall if writing is safe -- possibly only the contents are lost).
Also note that's really an APIC and not a kernel bug -- writing a
previously read value to a register is defined as valid by Intel.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
On Thu, 3 Jan 2002 15:57:34 +0100 (MET), Maciej W. Rozycki wrote:
>> @@ -84,6 +88,8 @@
>> apic_write_around(APIC_LVTERR, APIC_LVT_MASKED);
>> if (maxlvt >= 4)
>> apic_write_around(APIC_LVTPC, APIC_LVT_MASKED);
>> + apic_write(APIC_ESR, 0);
>> + v = apic_read(APIC_ESR);
>> }
>
> You mustn't access the ESR unconditionally -- it doesn't exist in the
>i82489DX. Also check the Pentium erratum 3AP for possible issues (I can't
>recall if writing is safe -- possibly only the contents are lost).
>
> Also note that's really an APIC and not a kernel bug -- writing a
>previously read value to a register is defined as valid by Intel.
Can you quote chapter and verse on that last statement?
(Preferably from IA32 Vol3.) Writes to APIC registers obviously
_do_ cause side-effects in some cases...
Pentium erratum 3AP can be ignored, since the whole purpose of the
two ESR accesses is to clear ESR. Unfortunately, Pentium erratum 11AP
does apply, since there are back-to-back APIC writes (first to LVTERR,
then to ESR).
The patch below fixes the 11AP and 82489DX issues. It has the usual test
before the write to ESR, which avoids both erratum 11AP (should the code
be moved around) and 3AP (redundantly, to avoid confusion).
(Linus, please apply.)
/Mikael
--- linux-2.5.2-pre6/arch/i386/kernel/apic.c.~1~ Thu Jan 3 19:08:00 2002
+++ linux-2.5.2-pre6/arch/i386/kernel/apic.c Thu Jan 3 19:58:26 2002
@@ -88,8 +88,12 @@
apic_write_around(APIC_LVTERR, APIC_LVT_MASKED);
if (maxlvt >= 4)
apic_write_around(APIC_LVTPC, APIC_LVT_MASKED);
- apic_write(APIC_ESR, 0);
- v = apic_read(APIC_ESR);
+ v = GET_APIC_VERSION(apic_read(APIC_LVR));
+ if (APIC_INTEGRATED(v)) { /* !82489DX */
+ if (maxlvt > 3) /* Due to Pentium errata 3AP and 11AP. */
+ apic_write(APIC_ESR, 0);
+ apic_read(APIC_ESR);
+ }
}
void __init connect_bsp_APIC(void)
On Thu, 3 Jan 2002, Mikael Pettersson wrote:
> > Also note that's really an APIC and not a kernel bug -- writing a
> >previously read value to a register is defined as valid by Intel.
>
> Can you quote chapter and verse on that last statement?
> (Preferably from IA32 Vol3.) Writes to APIC registers obviously
> _do_ cause side-effects in some cases...
Obviously I confused defined bits with reserved ones, sorry. Still the
APIC looks insane for me -- it should really signal an error only once an
interrupt is received, like it does for interrupts from remote sources.
> Pentium erratum 3AP can be ignored, since the whole purpose of the
> two ESR accesses is to clear ESR. Unfortunately, Pentium erratum 11AP
> does apply, since there are back-to-back APIC writes (first to LVTERR,
> then to ESR).
I checked 3AP and you are right, since we don't need the ESR's value.
> + if (maxlvt > 3) /* Due to Pentium errata 3AP and 11AP. */
> + apic_write(APIC_ESR, 0);
Use apic_write_around() instead as the 11AP workaround -- it was
introduced specifically for this purpose. Using anything else doesn't
guarantee no back-to-back APIC writes due to interrupts (specifically
writes to the EOI register).
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
On Fri, 4 Jan 2002 13:12:44 +0100 (MET), Maciej W. Rozycki wrote:
> Still the
>APIC looks insane for me -- it should really signal an error only once an
>interrupt is received, like it does for interrupts from remote sources.
But a remote interrupt source supplies a vector, doesn't it? If we
assume that the local APIC checks the vector on arrival _of_the_vector_,
then it does make some sense that it also flags a null vector written
to one of the LVT entries as an error. The bug really is that it forgets
to take the mask bit into account. However, the behaviour _is_ there and
we have to avoid triggering it.
>> + if (maxlvt > 3) /* Due to Pentium errata 3AP and 11AP. */
>> + apic_write(APIC_ESR, 0);
>
> Use apic_write_around() instead as the 11AP workaround -- it was
>introduced specifically for this purpose. Using anything else doesn't
>guarantee no back-to-back APIC writes due to interrupts (specifically
>writes to the EOI register).
I disagree. The write doesn't occur on P5s, so 11AP doesn't apply.
If I had used an unconditional write_around() instead, then someone
would complain that I'm violating erratum 3AP (even though it doesn't
matter in this case). The test as written unambiguously handles both
3AP and 11AP, and is identical to the "clear ESR" code in
setup_local_APIC() around line 385.
/Mikael
On Fri, 4 Jan 2002, Mikael Pettersson wrote:
> But a remote interrupt source supplies a vector, doesn't it? If we
> assume that the local APIC checks the vector on arrival _of_the_vector_,
> then it does make some sense that it also flags a null vector written
> to one of the LVT entries as an error. The bug really is that it forgets
A local interrupt arriving when masked can be treated as one that's not
destined to the APIC, hence no error should be signalled. And certainly
not at the LVT write time (especially as 0x00010000 is the power-up
default as well, possibly used by software predating integrated local
APICs).
> to take the mask bit into account. However, the behaviour _is_ there and
> we have to avoid triggering it.
Sure, but it'd better be marked as an APIC weirdness workaround or
otherwise code seems looks clueless. The weirdness is nowhere documented
and the code is to be read by others -- the fact we know what is happening
here doesn't help much.
> >> + if (maxlvt > 3) /* Due to Pentium errata 3AP and 11AP. */
> >> + apic_write(APIC_ESR, 0);
> >
> > Use apic_write_around() instead as the 11AP workaround -- it was
> >introduced specifically for this purpose. Using anything else doesn't
> >guarantee no back-to-back APIC writes due to interrupts (specifically
> >writes to the EOI register).
>
> I disagree. The write doesn't occur on P5s, so 11AP doesn't apply.
But the comment is misleading. With an apic_write_around() no comment is
needed (the code looks obvious) and the code is a bit smaller.
> If I had used an unconditional write_around() instead, then someone
> would complain that I'm violating erratum 3AP (even though it doesn't
> matter in this case). The test as written unambiguously handles both
If someone digs deeply enough to check what 3AP is, he should know why it
doesn't apply.
> 3AP and 11AP, and is identical to the "clear ESR" code in
> setup_local_APIC() around line 385.
It's not identical -- the code in setup_local_APIC() doesn't discard what
is read from ESR so it can't do apic_write_around(). Note that you must
write to ESR before reading it on a P6 and you must not write to ESR
before reading it on a Pentium if you want to retrieve meaningful data.
If you only want to clear ESR, you just need to ensure there is a
write-read sequence somewhere.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +