In setup_APIC_timer with the HPET in use, a condition can arise while
waiting for the next irq slice to expire on the HPET which will either
cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
watchdog is disabled.
The HPET comparator and the counter keep incrementing during its normal
operation. When a comparison event fires the comparator will increment
by the designated period. If the HPET trigger occurs right after
the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
counter to wrap around. However, when the NMI watchdog is enabled the
NMI watchdog will detect a lockup and reboot the machine. This
scenario can be exasperated by the presence of an SMI which will
increase the window of opportunity for the condition to occur.
The fix is to wait for the compartor to change which signals the
end of the tick slice.
---
The last patch had a typo in the diff which really would cause the
problem state above. Sorry.
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 900ff38..06797e2 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
/* wait for irq slice */
if (hpet_address && hpet_use_timer) {
+ /*
+ * Wait for the comparator value to change which signals that
+ * the tick slice has expired.
+ */
int trigger = hpet_readl(HPET_T0_CMP);
- while (hpet_readl(HPET_COUNTER) >= trigger)
- /* do nothing */ ;
- while (hpet_readl(HPET_COUNTER) < trigger)
+ while (trigger == hpet_readl(HPET_T0_CMP))
/* do nothing */ ;
} else {
int c1, c2;
On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote:
> In setup_APIC_timer with the HPET in use, a condition can arise while
> waiting for the next irq slice to expire on the HPET which will either
> cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> watchdog is disabled.
>
> The HPET comparator and the counter keep incrementing during its normal
> operation. When a comparison event fires the comparator will increment
> by the designated period. If the HPET trigger occurs right after
> the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> counter to wrap around. However, when the NMI watchdog is enabled the
> NMI watchdog will detect a lockup and reboot the machine. This
> scenario can be exasperated by the presence of an SMI which will
> increase the window of opportunity for the condition to occur.
Nasty.
Iirc the clockevents code did away with the synchronization
completely because Thomas determined it was useless because long term
it'll drift anyways. So perhaps it could be just removed?
-Andi
>
On 8/8/07, Andi Kleen <[email protected]> wrote:
> On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote:
> > In setup_APIC_timer with the HPET in use, a condition can arise while
> > waiting for the next irq slice to expire on the HPET which will either
> > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> > watchdog is disabled.
> >
> > The HPET comparator and the counter keep incrementing during its normal
> > operation. When a comparison event fires the comparator will increment
> > by the designated period. If the HPET trigger occurs right after
> > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> > counter to wrap around. However, when the NMI watchdog is enabled the
> > NMI watchdog will detect a lockup and reboot the machine. This
> > scenario can be exasperated by the presence of an SMI which will
> > increase the window of opportunity for the condition to occur.
>
> Nasty.
>
> Iirc the clockevents code did away with the synchronization
> completely because Thomas determined it was useless because long term
> it'll drift anyways. So perhaps it could be just removed?
>
I was thinking along the same lines as well, but I really didn't know how
important all that code was for waiting for the next irq slice. I'm not time
expert, but I would imagine we would resynchronize correctly in the future after
this code path?
>
> I was thinking along the same lines as well, but I really didn't know how
> important all that code was for waiting for the next irq slice. I'm not time
> expert, but I would imagine we would resynchronize correctly in the future after
> this code path?
The idea is to just have the apic timers roughly at the same times + some fixed
offset as the main timer. Otherwise e.g. add_timer could jitter up to a jiffie
regarding the jiffies counter. But long term drift will destroy the synchronization
anyways.
I guess i'll send it in anyways for .23; it would be too risky to change anything
radical in this area that late. Then later it can be revisited.
-Andi
On Wed, 8 Aug 2007 16:17:19 -0700
Aaron Durbin <[email protected]> wrote:
> In setup_APIC_timer with the HPET in use, a condition can arise while
> waiting for the next irq slice to expire on the HPET which will either
> cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> watchdog is disabled.
>
> The HPET comparator and the counter keep incrementing during its normal
> operation. When a comparison event fires the comparator will increment
> by the designated period. If the HPET trigger occurs right after
> the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> counter to wrap around. However, when the NMI watchdog is enabled the
> NMI watchdog will detect a lockup and reboot the machine. This
> scenario can be exasperated by the presence of an SMI which will
> increase the window of opportunity for the condition to occur.
>
> The fix is to wait for the compartor to change which signals the
> end of the tick slice.
>
> ---
>
> The last patch had a typo in the diff which really would cause the
> problem state above. Sorry.
>
> diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
> index 900ff38..06797e2 100644
> --- a/arch/x86_64/kernel/apic.c
> +++ b/arch/x86_64/kernel/apic.c
> @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
>
> /* wait for irq slice */
> if (hpet_address && hpet_use_timer) {
> + /*
> + * Wait for the comparator value to change which signals that
> + * the tick slice has expired.
> + */
> int trigger = hpet_readl(HPET_T0_CMP);
> - while (hpet_readl(HPET_COUNTER) >= trigger)
> - /* do nothing */ ;
> - while (hpet_readl(HPET_COUNTER) < trigger)
> + while (trigger == hpet_readl(HPET_T0_CMP))
> /* do nothing */ ;
> } else {
> int c1, c2;
but, but. Didn't this get fixed by
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
?
On 8/9/07, Andrew Morton <[email protected]> wrote:
> On Wed, 8 Aug 2007 16:17:19 -0700
> Aaron Durbin <[email protected]> wrote:
>
> > In setup_APIC_timer with the HPET in use, a condition can arise while
> > waiting for the next irq slice to expire on the HPET which will either
> > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> > watchdog is disabled.
> >
> > The HPET comparator and the counter keep incrementing during its normal
> > operation. When a comparison event fires the comparator will increment
> > by the designated period. If the HPET trigger occurs right after
> > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> > counter to wrap around. However, when the NMI watchdog is enabled the
> > NMI watchdog will detect a lockup and reboot the machine. This
> > scenario can be exasperated by the presence of an SMI which will
> > increase the window of opportunity for the condition to occur.
> >
> > The fix is to wait for the compartor to change which signals the
> > end of the tick slice.
> >
> > ---
> >
> > The last patch had a typo in the diff which really would cause the
> > problem state above. Sorry.
> >
> > diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
> > index 900ff38..06797e2 100644
> > --- a/arch/x86_64/kernel/apic.c
> > +++ b/arch/x86_64/kernel/apic.c
> > @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
> >
> > /* wait for irq slice */
> > if (hpet_address && hpet_use_timer) {
> > + /*
> > + * Wait for the comparator value to change which signals that
> > + * the tick slice has expired.
> > + */
> > int trigger = hpet_readl(HPET_T0_CMP);
> > - while (hpet_readl(HPET_COUNTER) >= trigger)
> > - /* do nothing */ ;
> > - while (hpet_readl(HPET_COUNTER) < trigger)
> > + while (trigger == hpet_readl(HPET_T0_CMP))
> > /* do nothing */ ;
> > } else {
> > int c1, c2;
>
> but, but. Didn't this get fixed by
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
> ?
>
That original patch looks good to me as well. I just checked git head to see if
this was already fixed upstream, and it didn't appear to be. Sorry for the
duplicate patch.
-Aaron
On Thu, 9 Aug 2007 14:16:16 -0700
"Aaron Durbin" <[email protected]> wrote:
> > but, but. Didn't this get fixed by
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
> > ?
> >
>
> That original patch looks good to me as well. I just checked git head to see if
> this was already fixed upstream, and it didn't appear to be. Sorry for the
> duplicate patch.
Yeah, sorry, I'm being a bit more batch-oriented in getting stuff into
Linus nowadays. I seem to have 44 patches to go and another 20 which
should go, but which belong to subsystem maintainers. We'll get there.